Annotation Field Descriptions
Comprehensive reference for all annotation fields in Bystro's output files. Understanding these fields helps you interpret variant annotations, filter results effectively, and extract meaningful insights from your genomic data.
Field Notation
Italicized fields are custom Bystro fields. All others are sourced from public databases as described.
'!'';' (e.g., transcripts)'|'Basic Fields
Sourced from the input file, or calculated based on input fields
Position & Variant Information
chrom- Chromosome, always prepended with "chr"pos- Genomic position after Bystro normalizes variant representations- - Positions always correspond to the first affected base
type- The type of variant- - VCF format types:
SNP,INS,DEL,MULTIALLELIC - - SNP format types:
SNP,INS,DEL,MULTIALLELIC,DENOVO - - MNPs are decomposed into separate "SNP" rows (future releases will label as "MNPs" with linkage properties)
- - Multiallelics are decomposed into separate rows but retain "MULTIALLELIC" type
inputRef- The reference base (always 1 base long)- - Generated by input file pre-processor
- - Always the affected reference base at that position
alt- The alternate/nonreference allele- - VCF multi-allelic and MNP sites are decomposed into individual entries
- - Genotypes are properly segregated per allele
ref- The Bystro-annotated UCSC reference- - For insertions: always 2 bases long (base before + base after insertion)
- - For deletions: as long as the deletion (up to 32 bases), 1 annotation per deleted base
Population Genetics
trTv- Transition:transversion ratio for your dataset at this positionheterozygotes- The heterozygous sample labelsheterozygosity- Fraction of samples that are heterozygous for the alternate allelehomozygotes- The homozygous sample labelshomozygosity- Fraction of samples that are homozygous for the alternate allelemissingGenos- Samples that did not have a genotype (e.g., ".")missingness- Fraction of samples with missing genotypesac- The alternate allele countan- The total non-missing allele countsampleMaf- The in-sample alternate allele frequencyFile Metadata
vcfPos- Original VCF POS, unaffected by Bystro normalizationid- The VCF ID fielddiscordant- True if input VCF reference matches Bystro-annotated UCSC referenceRefSeq Annotations
refSeq.* annotations are based on RefSeq transcripts. See UCSC refGene and kgXref for details.
Note: When a site is intergenic, all refSeq annotations will be NA. Consequences are annotated for all overlapping RefSeq transcripts and can be matched to their corresponding transcript names.
Functional Effects
refSeq.siteType- Effect type on transcript- - Types:
intronic,exonic,UTR3,UTR5,spliceAcceptor,spliceDonor,ncRNA
refSeq.exonicAlleleFunction- Coding effect of the variant- - Values:
synonymous,nonSynonymous,indel-nonFrameshift,indel-frameshift,stopGain,stopLoss,startLoss - -
NAfor non-coding siteTypes
Protein Impact
refSeq.refCodon- Reference codon from in silico transcriptionrefSeq.altCodon- In silico transcribed codon after alt allele modificationrefSeq.refAminoAcid- Amino acid from in silico translation of referencerefSeq.altAminoAcid- In silico translated amino acid after alt allelerefSeq.codonPosition- Position within codon (1, 2, 3)refSeq.codonNumber- Codon number within transcriptrefSeq.strand- Positive or negative watson/crick strandGene & Transcript Identifiers
refSeq.name- RefSeq transcript IDrefSeq.name2- RefSeq gene symbolrefSeq.description- Long form description of RefSeq transcriptrefSeq.kgID- UCSC's Known Genes IDrefSeq.mRNA- mRNA ID (transcript ID starting with NM_)refSeq.ensemblID- Ensembl transcript IDrefSeq.isCanonical- Whether this is the canonical transcript for the geneProximity Annotations
nearest.refSeq
Nearest transcript(s) by txStart, txEnd boundaries
nearest.refSeq.name2- Gene symbolnearest.refSeq.name- Transcript IDnearest.refSeq.dist- Distance to transcriptnearestTss.refSeq
Nearest transcript(s) by distance to transcription start site
nearestTss.refSeq.name2- Gene symbolnearestTss.refSeq.name- Transcript IDnearestTss.refSeq.dist- Distance to TSSExternal Database Annotations
ClinVar (clinvarVcf)
Clinical significance annotations from ClinVar VCF dataset
clinvarVcf.id- ClinVar VCF IDclinvarVcf.alt- ALT allele for this siteclinvarVcf.CLNSIG- Germline classificationclinvarVcf.CLNDN- Preferred disease nameclinvarVcf.CLNDNINCL- Disease name for included variantsclinvarVcf.CLNREVSTAT- Review statusclinvarVcf.CLNHGVS- HGVS expressionclinvarVcf.CLNSIGCONF- Conflicting classificationsclinvarVcf.ALLELEID- ClinVar Allele IDclinvarVcf.AF_ESP- GO-ESP frequenciesclinvarVcf.AF_EXAC- ExAC frequenciesclinvarVcf.AF_TGP- 1000 Genomes frequenciesclinvarVcf.CLNVCSO- Sequence Ontology variant typeclinvarVcf.DBVARID- dbVar NSV accessionsclinvarVcf.ORIGIN- Allele originclinvarVcf.SSR- Suspect reason codesclinvarVcf.RS- dbSNP ID (rs number)gnomAD Exomes (gnomad.exomes)
Population frequencies from gnomAD exome dataset
gnomad.exomes.alt- ALT allelegnomad.exomes.id- gnomAD VCF IDgnomad.exomes.AN- Total allele numbergnomad.exomes.AF- Overall allele frequencygnomad.exomes.AN_female- Female allele numbergnomad.exomes.AF_female- Female allele frequencygnomad.exomes.non_cancer_AN- Non-cancer ANgnomad.exomes.non_cancer_AF- Non-cancer AFgnomad.exomes.non_neuro_AN- Non-neuro ANgnomad.exomes.non_neuro_AF- Non-neuro AFgnomad.exomes.non_topmed_AN- Non-TOPMed ANgnomad.exomes.non_topmed_AF- Non-TOPMed AFgnomad.exomes.controls_AN- Controls ANgnomad.exomes.controls_AF- Controls AFgnomad.exomes.AN_nfe_seu- Southern European ANgnomad.exomes.AF_nfe_seu- Southern European AFgnomad.exomes.AN_nfe_bgr- Bulgarian ANgnomad.exomes.AF_nfe_bgr- Bulgarian AFgnomad.exomes.AN_afr- African/African-American ANgnomad.exomes.AF_afr- African/African-American AFgnomad.exomes.AN_sas- South Asian ANgnomad.exomes.AF_sas- South Asian AFgnomad.exomes.AN_nfe_onf- Other Non-Finnish European ANgnomad.exomes.AF_nfe_onf- Other Non-Finnish European AFgnomad.exomes.AN_amr- Latino/Admixed American ANgnomad.exomes.AF_amr- Latino/Admixed American AFgnomad.exomes.AN_eas- East Asian ANgnomad.exomes.AF_eas- East Asian AFgnomad.exomes.AN_nfe_swe- Swedish ANgnomad.exomes.AF_nfe_swe- Swedish AFgnomad.exomes.AN_nfe_nwe- Northwest European ANgnomad.exomes.AF_nfe_nwe- Northwest European AFgnomad.exomes.AN_eas_jpn- Japanese ANgnomad.exomes.AF_eas_jpn- Japanese AFgnomad.exomes.AN_eas_kor- Korean ANgnomad.exomes.AF_eas_kor- Korean AFgnomAD Genomes (gnomad.genomes)
Population frequencies from gnomAD v4 (hg38) or v2.1.1 (hg19) whole-genome dataset
gnomad.genomes.alt- ALT allelegnomad.genomes.id- gnomAD VCF IDgnomad.genomes.AN- Total allele numbergnomad.genomes.AF- Overall allele frequencygnomad.genomes.AN_female- Female allele numbergnomad.genomes.AF_female- Female allele frequencygnomad.genomes.non_neuro_AN- Non-neuro ANgnomad.genomes.non_neuro_AF- Non-neuro AFgnomad.genomes.non_topmed_AN- Non-TOPMed ANgnomad.genomes.non_topmed_AF- Non-TOPMed AFgnomad.genomes.controls_AN- Controls ANgnomad.genomes.controls_AF- Controls AFgnomad.genomes.AN_nfe_seu- Southern European ANgnomad.genomes.AF_nfe_seu- Southern European AFgnomad.genomes.AN_afr- African/African-American ANgnomad.genomes.AF_afr- African/African-American AFgnomad.genomes.AN_nfe_onf- Other Non-Finnish European ANgnomad.genomes.AF_nfe_onf- Other Non-Finnish European AFgnomad.genomes.AN_amr- Latino/Admixed American ANgnomad.genomes.AF_amr- Latino/Admixed American AFgnomad.genomes.AN_eas- East Asian ANgnomad.genomes.AF_eas- East Asian AFgnomad.genomes.AN_nfe_nwe- Northwest European ANgnomad.genomes.AF_nfe_nwe- Northwest European AFgnomad.genomes.AN_nfe_est- Estonian ANgnomad.genomes.AF_nfe_est- Estonian AFgnomad.genomes.AN_nfe- Non-Finnish European ANgnomad.genomes.AF_nfe- Non-Finnish European AFgnomad.genomes.AN_fin- Finnish ANgnomad.genomes.AF_fin- Finnish AFgnomad.genomes.AN_asj- Ashkenazi Jewish ANgnomad.genomes.AF_asj- Ashkenazi Jewish AFgnomad.genomes.AN_oth- Other ancestry ANgnomad.genomes.AF_oth- Other ancestry AFdbSNP
dbSNP 155 annotations with population frequencies from multiple studies
dbSNP.id- dbSNP VCF IDdbSNP.alt- ALT alleledbSNP.GnomAD- gnomAD v3 frequenciesdbSNP.GnomAD_exomes- gnomAD exome frequenciesdbSNP.1000Genomes- 1000 Genomes frequenciesdbSNP.TOPMED- TOPMED frequenciesdbSNP.ExAC- ExAC frequenciesdbSNP.GoESP- NHLBI ESP frequenciesdbSNP.HapMap- HapMap frequenciesdbSNP.dbGaP_PopFreq- dbGaP aggregated frequenciesdbSNP.TOMMO- Tohoku Medical MegabankdbSNP.Korea1K- Korea1K datasetdbSNP.KOREAN- Korean Reference GenomedbSNP.Vietnamese- Kinh Vietnamese databasedbSNP.GoNL- Genome of NetherlandsdbSNP.GENOME_DK- Danish reference pan genomedbSNP.NorthernSweden- Northern Sweden samplesdbSNP.TWINSUK- TwinsUK cohortdbSNP.ALSPAC- ALSPAC cohortdbSNP.Siberian- Siberian populationsdbSNP.Qatari- Qatar Genome datasetdbSNP.MGP- Spanish population (MGP)dbSNP.PRJEB37584- Project PRJEB37584dbSNP.SGDP_PRJ- Simons Genome Diversity ProjectDeleteriousness Scores
CADD Scores
Combined Annotation Dependent Depletion scores ≥0 indicating deleteriousness. Variants with CADD > 15 are more likely to be deleterious.
cadd (SNPs)
cadd- CADD score for SNPscaddIndel (Indels & MNPs)
caddIndel.alt- ALT allelecaddIndel.PHRED- CADD PHRED scoreNote: Since Bystro decomposes MNPs into "SNP" records, caddIndel may occasionally be populated for SNPs that are part of MNPs.
Using Field Descriptions
- ▶Filter effectively: Use these field descriptions to build precise queries
- ▶Understand relationships: Match transcript annotations to gene symbols using array ordering
- ▶Population context: Compare your sample frequencies to public database frequencies
- ▶Clinical relevance: Combine ClinVar significance with deleteriousness scores