Result Files Description
When your annotation completes, Bystro generates a comprehensive set of output files containing annotated variants, quality control metrics, ancestry analysis, polygenic risk scores, and machine learning-ready datasets. Here's what each file contains and how to use them.
Core Annotation Files
sample_vcf.annotation.tsv.gz
Main annotation output
Tab-separated file with comprehensive variant annotations, one row per variant with extensive genomic information.
Use for: Detailed variant analysis, filtering by annotation criteria, identifying functional variants, generating custom reports.
Format: Block gzipped TSV — decompress with bgzip, gzip, or pigz.
sample_vcf.dosage.feather
Genotype dosage matrix
Machine learning-ready format with variant dosages (0, 1, 2) for each sample, optimized for polygenic risk score calculations.
Use for: Polygenic risk scores, GWAS, machine learning, statistical genetics analyses.
Format: Arrow Feather V2 — supported by Python pandas, polars, R, and Julia.
Structure: First column = chr:pos:ref:alt, remaining columns = per-sample dosages.
Quality Control & Metadata
Sample Information & Statistics
sample_vcf.sample_listList of included samplessample_vcf.statistics.tsvSample QC statistics (TSV)sample_vcf.statistics.jsonSample QC statistics (JSON)sample_vcf.statistics.qc.tsvSamples failing QC (> 3 std dev)
Configuration & Documentation
hg19.yml / hg38.ymlAnnotation configuration filesample_vcf.annotation.header.jsonColumn descriptionssample_vcf.annotation.log.txtProcessing log filebystro_annotation.completeCompletion marker
Performance & Cache Files
sample_vcf.dosage.feather.index.gzDosage matrix indexfiltered_dosage_matricesCached filtered data
Advanced Analysis Results
Polygenic Risk Scores (PRS)
Risk predictions based on GWAS summary statistics. Each GWAS study produces a separate PRS file, e.g.:
sample_vcf.AD.30617256.prs.tsvAlzheimer's Disease GWASsample_vcf.AD.35379992.prs.tsvAlternative AD GWASsample_vcf.IBD.PMIDLIU.prs.tsvInflammatory Bowel Disease
Download CSV files directly from the web interface, or use CLI tools to convert.
Ancestry Analysis
Genetic ancestry inference using principal component analysis and reference populations.
ancestry_results.jsonPrimary output fileDownload CSV directly from the web interface, or use the Bystro CLI to convert JSON results.
Variant Reports
Curated reports highlighting variants of clinical or research significance. Available for download through the Dashboard UI.
Working with Your Results
Extract the archive
Your results come as a compressed file ending in .tar.gz. Extract it to access the individual files.
Windows
Built-in (Windows 10/11): Right-click the file → "Extract All"
Alternative: Download 7-Zip (free) if built-in extraction doesn't work
Mac
Double-click the file — it will extract automatically
Linux / Command Line
tar -xzf sample_vcf_results.tar.gzChoose your analysis path
Detailed Variant Analysis
Use .tsv.gz for comprehensive variant annotation analysis
Machine Learning
Use .feather for ML workflows and statistical genetics
Review quality control
Check the statistics files to identify samples that may need exclusion from downstream analysis.
Access advanced results
Download ancestry and PRS results directly from the web interface, or convert using CLI tools.
File Format Details
Compression Formats
Block gzip (.tsv.gz): Decompress with bgzip, gzip, or pigz.
Arrow Feather (.feather): Native binary format, no decompression needed.
Loading the Dosage Matrix
Python
import pandas as pd
df = pd.read_feather('sample_vcf.dosage.feather')R
library(arrow)
df <- read_feather('sample_vcf.dosage.feather')Filter in the UI first before downloading for spreadsheet work — annotation files can be too large to open directly.
Check QC files to identify potential sample quality issues before running downstream analyses.
Keep the config file for reproducibility and method documentation.
Use CLI tools or Dashboard download for converting ancestry and PRS results to CSV format.