Getting Started
Welcome to Bystro, the world's first natural language AI platform for genetics and proteomics. Ask questions in plain English and get research-grade analysis instantly. No code required.
Start with our onboarding videos for a guided walkthrough, or explore a public dataset to try both platforms without uploading your own data.
Two platforms, one workflow
DASH produces the grounded data layer: annotated, versioned, and reproducible. THiNK reasons on top of it. When THiNK surfaces a finding worth a closer look, jump to DASH to verify it against the raw annotation, then bring that context back into your conversation.
Bystro Classic
DASH
The annotation and dashboard layer. Upload a VCF file and get back a fully annotated dataset with 200+ fields from seven curated databases: ancestry, polygenic risk scores, clinical significance, population frequencies, and more. Fast, versioned, reproducible.
gnomAD · dbSNP · CADD+ more databases
What it does
Natural-language variant search
Every one of 200+ annotation fields is indexed. Filter by any arbitrary combination across millions of variants and thousands of samples. No query language required.
Cross-ancestry polygenic risk scores
Ancestry-corrected scores from any GWAS study with available summary statistics. Built-in studies for Alzheimer's and IBD, or bring your own.
Rich pathogenicity scoring
CADD PHRED scores for SNPs and indels, SpliceAI and Pangolin splice predictions, PolyPhen-2 and SIFT missense impact, PhyloP conservation.
Multi-sample cohort support
Single jobs spanning thousands of samples. Arrow Feather v2 dosage matrices ready for downstream ML pipelines and GWAS frameworks.
Automatic QC you can trust
Per-sample Ti/Tv ratios, het/hom ratios, missingness rates, Watterson's theta. Every annotation traceable to a specific database version.
Bystro Agent
THiNK
Not a chatbot. A research collaborator. Ask any genetics question in plain language and THiNK reasons on top of your DASH data, searches the published literature, runs custom statistical analyses, generates visualizations, and verifies every claim before it answers.
What it does
Runs real analysis, not just reasoning
THiNK executes Python directly in a persistent sandbox: GWAS, PCA, logistic regression, survival analysis, clustering, de novo variant detection, kinship estimation, proteomics integration. Results and publication-quality figures embedded in the conversation.
Literature synthesis across 10+ databases
PubMed, bioRxiv, ClinVar, gnomAD, OMIM, Reactome, UniProt, KEGG, PharmGKB, and more, queried at runtime. Every claim cited.
Gigabyte-scale file uploads
VCFs, FragPipe TMT proteomics data, spreadsheets, PDFs, BAMs. Any file type, any size.
Multi-turn depth
Refine hypotheses, explore alternatives, and run follow-ups in one continuous thread. Uncertainty flagged. Sources always cited.
What it reads
Embedded in DASH annotation
Live internet search
Support & Resources
Documentation
- Onboarding Videos
Step-by-step walkthroughs of both platforms
- Vignettes
Real-world analysis examples and use cases
- Annotation Field Reference
Every annotation field explained with examples
Get Help
- Email Support
Direct line to our team
- FAQ
Common questions about annotation, search, and data
- Community Discord
Ask questions and connect with other Bystro users
Common Questions
Do I need my own data to get started?
No. THiNK is the most comprehensive answer engine for your questions, with or without data. We also have public datasets like the 1000 Genomes Project available to explore both platforms immediately.
How does Bystro handle missing data?
Null values are never imputed as 0. A query like gnomad.exomes.af < .01 skips sites without reliable gnomAD data. To include missing sites, use an OR query: gnomad.exomes.af < .01 || !gnomad.exomes.
What genome builds are supported?
Both hg19 (GRCh37) and hg38 (GRCh38) are supported. You select the assembly at upload time on DASH.
Ready to get started?
Join researchers worldwide using Bystro to go from raw genetic data to insight, in minutes, not months.