Getting Started

Welcome to Bystro, the world's first natural language AI platform for genetics and proteomics. Ask questions in plain English and get research-grade analysis instantly. No code required.

New to Bystro?

Start with our onboarding videos for a guided walkthrough, or explore a public dataset to try both platforms without uploading your own data.

Two platforms, one workflow

DASHTHiNK

DASH produces the grounded data layer: annotated, versioned, and reproducible. THiNK reasons on top of it. When THiNK surfaces a finding worth a closer look, jump to DASH to verify it against the raw annotation, then bring that context back into your conversation.

Bystro Classic

DASH

The annotation and dashboard layer. Upload a VCF file and get back a fully annotated dataset with 200+ fields from seven curated databases: ancestry, polygenic risk scores, clinical significance, population frequencies, and more. Fast, versioned, reproducible.

200+Annotation fields

RefSeq · ClinVar
gnomAD · dbSNP · CADD+ more databases

23+Population frequencies

11gnomAD ancestry groups

What it does

Natural-language variant search

Every one of 200+ annotation fields is indexed. Filter by any arbitrary combination across millions of variants and thousands of samples. No query language required.

Cross-ancestry polygenic risk scores

Ancestry-corrected scores from any GWAS study with available summary statistics. Built-in studies for Alzheimer's and IBD, or bring your own.

Rich pathogenicity scoring

CADD PHRED scores for SNPs and indels, SpliceAI and Pangolin splice predictions, PolyPhen-2 and SIFT missense impact, PhyloP conservation.

Multi-sample cohort support

Single jobs spanning thousands of samples. Arrow Feather v2 dosage matrices ready for downstream ML pipelines and GWAS frameworks.

Automatic QC you can trust

Per-sample Ti/Tv ratios, het/hom ratios, missingness rates, Watterson's theta. Every annotation traceable to a specific database version.

View DASH documentation →

Bystro Agent

THiNK

Not a chatbot. A research collaborator. Ask any genetics question in plain language and THiNK reasons on top of your DASH data, searches the published literature, runs custom statistical analyses, generates visualizations, and verifies every claim before it answers.

What it does

Runs real analysis, not just reasoning

THiNK executes Python directly in a persistent sandbox: GWAS, PCA, logistic regression, survival analysis, clustering, de novo variant detection, kinship estimation, proteomics integration. Results and publication-quality figures embedded in the conversation.

Literature synthesis across 10+ databases

PubMed, bioRxiv, ClinVar, gnomAD, OMIM, Reactome, UniProt, KEGG, PharmGKB, and more, queried at runtime. Every claim cited.

Gigabyte-scale file uploads

VCFs, FragPipe TMT proteomics data, spreadsheets, PDFs, BAMs. Any file type, any size.

Multi-turn depth

Refine hypotheses, explore alternatives, and run follow-ups in one continuous thread. Uncertainty flagged. Sources always cited.

What it reads

Embedded in DASH annotation

ClinVargnomADdbSNPCADDSpliceAIPhyloP

Live internet search

PubMedbioRxivOMIMUniProtReactomeKEGGPharmGKBGTExSTRING

See THiNK examples →

Support & Resources

Documentation

Onboarding Videos
Step-by-step walkthroughs of both platforms
Vignettes
Real-world analysis examples and use cases
Annotation Field Reference
Every annotation field explained with examples

Get Help

Email Support
Direct line to our team
FAQ
Common questions about annotation, search, and data
Community Discord
Ask questions and connect with other Bystro users

Common Questions

Do I need my own data to get started?

No. THiNK is the most comprehensive answer engine for your questions, with or without data. We also have public datasets like the 1000 Genomes Project available to explore both platforms immediately.

How does Bystro handle missing data?

Null values are never imputed as 0. A query like gnomad.exomes.af < .01 skips sites without reliable gnomAD data. To include missing sites, use an OR query: gnomad.exomes.af < .01 || !gnomad.exomes.

What genome builds are supported?

Both hg19 (GRCh37) and hg38 (GRCh38) are supported. You select the assembly at upload time on DASH.

View all frequently asked questions →

Ready to get started?

Join researchers worldwide using Bystro to go from raw genetic data to insight, in minutes, not months.

Join our Discord