Getting Started

Welcome to Bystro, the world's first natural language AI platform for genetics and proteomics. Ask questions in plain English and get research-grade analysis instantly. No code required.

New to Bystro?

Start with our onboarding videos for a guided walkthrough, or explore a public dataset to try both platforms without uploading your own data.

Two platforms, one workflow

DASHTHiNK

DASH produces the grounded data layer: annotated, versioned, and reproducible. THiNK reasons on top of it. When THiNK surfaces a finding worth a closer look, jump to DASH to verify it against the raw annotation, then bring that context back into your conversation.

D

Bystro Classic

DASH

The annotation and dashboard layer. Upload a VCF file and get back a fully annotated dataset with 200+ fields from seven curated databases: ancestry, polygenic risk scores, clinical significance, population frequencies, and more. Fast, versioned, reproducible.

200+Annotation fields
RefSeq · ClinVar
gnomAD · dbSNP · CADD
+ more databases
23+Population frequencies
11gnomAD ancestry groups

What it does

Natural-language variant search

Every one of 200+ annotation fields is indexed. Filter by any arbitrary combination across millions of variants and thousands of samples. No query language required.

Cross-ancestry polygenic risk scores

Ancestry-corrected scores from any GWAS study with available summary statistics. Built-in studies for Alzheimer's and IBD, or bring your own.

Rich pathogenicity scoring

CADD PHRED scores for SNPs and indels, SpliceAI and Pangolin splice predictions, PolyPhen-2 and SIFT missense impact, PhyloP conservation.

Multi-sample cohort support

Single jobs spanning thousands of samples. Arrow Feather v2 dosage matrices ready for downstream ML pipelines and GWAS frameworks.

Automatic QC you can trust

Per-sample Ti/Tv ratios, het/hom ratios, missingness rates, Watterson's theta. Every annotation traceable to a specific database version.

T

Bystro Agent

THiNK

Not a chatbot. A research collaborator. Ask any genetics question in plain language and THiNK reasons on top of your DASH data, searches the published literature, runs custom statistical analyses, generates visualizations, and verifies every claim before it answers.

What it does

Runs real analysis, not just reasoning

THiNK executes Python directly in a persistent sandbox: GWAS, PCA, logistic regression, survival analysis, clustering, de novo variant detection, kinship estimation, proteomics integration. Results and publication-quality figures embedded in the conversation.

Literature synthesis across 10+ databases

PubMed, bioRxiv, ClinVar, gnomAD, OMIM, Reactome, UniProt, KEGG, PharmGKB, and more, queried at runtime. Every claim cited.

Gigabyte-scale file uploads

VCFs, FragPipe TMT proteomics data, spreadsheets, PDFs, BAMs. Any file type, any size.

Multi-turn depth

Refine hypotheses, explore alternatives, and run follow-ups in one continuous thread. Uncertainty flagged. Sources always cited.

What it reads

Embedded in DASH annotation

ClinVargnomADdbSNPCADDSpliceAIPhyloP

Live internet search

PubMedbioRxivOMIMUniProtReactomeKEGGPharmGKBGTExSTRING

Support & Resources

Documentation

Get Help

Common Questions

Do I need my own data to get started?

No. THiNK is the most comprehensive answer engine for your questions, with or without data. We also have public datasets like the 1000 Genomes Project available to explore both platforms immediately.

How does Bystro handle missing data?

Null values are never imputed as 0. A query like gnomad.exomes.af < .01 skips sites without reliable gnomAD data. To include missing sites, use an OR query: gnomad.exomes.af < .01 || !gnomad.exomes.

What genome builds are supported?

Both hg19 (GRCh37) and hg38 (GRCh38) are supported. You select the assembly at upload time on DASH.

View all frequently asked questions →

Ready to get started?

Join researchers worldwide using Bystro to go from raw genetic data to insight, in minutes, not months.