Result Files Description

When your annotation completes, Bystro generates a comprehensive set of output files containing annotated variants, quality control metrics, ancestry analysis, polygenic risk scores, and machine learning-ready datasets. Here's what each file contains and how to use them.

Download package

Your results are delivered as a compressed tarball containing multiple files. Extract the archive to access individual components for downstream analysis.

Core Annotation Files

sample_vcf.annotation.tsv.gz

Main annotation output

Tab-separated file with comprehensive variant annotations, one row per variant with extensive genomic information.

Use for: Detailed variant analysis, filtering by annotation criteria, identifying functional variants, generating custom reports.

Format: Block gzipped TSV — decompress with bgzip, gzip, or pigz.

Large file warning

This file can be enormous for large cohorts (billions of variants across thousands of samples). Opening directly in Excel is not recommended. Use the Bystro web interface to filter and subset data, then download smaller filtered results for spreadsheet analysis.

sample_vcf.dosage.feather

Genotype dosage matrix

Machine learning-ready format with variant dosages (0, 1, 2) for each sample, optimized for polygenic risk score calculations.

Use for: Polygenic risk scores, GWAS, machine learning, statistical genetics analyses.

Format: Arrow Feather V2 — supported by Python pandas, polars, R, and Julia.

Structure: First column = chr:pos:ref:alt, remaining columns = per-sample dosages.

Quality Control & Metadata

Sample Information & Statistics

sample_vcf.sample_listList of included samples
sample_vcf.statistics.tsvSample QC statistics (TSV)
sample_vcf.statistics.jsonSample QC statistics (JSON)
sample_vcf.statistics.qc.tsvSamples failing QC (> 3 std dev)

Configuration & Documentation

hg19.yml / hg38.ymlAnnotation configuration file
sample_vcf.annotation.header.jsonColumn descriptions
sample_vcf.annotation.log.txtProcessing log file
bystro_annotation.completeCompletion marker

Performance & Cache Files

sample_vcf.dosage.feather.index.gzDosage matrix index
filtered_dosage_matricesCached filtered data

Advanced Analysis Results

Polygenic Risk Scores (PRS)

Risk predictions based on GWAS summary statistics. Each GWAS study produces a separate PRS file, e.g.:

sample_vcf.AD.30617256.prs.tsvAlzheimer's Disease GWAS
sample_vcf.AD.35379992.prs.tsvAlternative AD GWAS
sample_vcf.IBD.PMIDLIU.prs.tsvInflammatory Bowel Disease

Download CSV files directly from the web interface, or use CLI tools to convert.

Ancestry Analysis

Genetic ancestry inference using principal component analysis and reference populations.

ancestry_results.jsonPrimary output file

Download CSV directly from the web interface, or use the Bystro CLI to convert JSON results.

Variant Reports

Curated reports highlighting variants of clinical or research significance. Available for download through the Dashboard UI.

Working with Your Results

Extract the archive

Your results come as a compressed file ending in .tar.gz. Extract it to access the individual files.

Windows

Built-in (Windows 10/11): Right-click the file → "Extract All"

Alternative: Download 7-Zip (free) if built-in extraction doesn't work

Mac

Double-click the file — it will extract automatically

Linux / Command Line

tar -xzf sample_vcf_results.tar.gz

Choose your analysis path

Detailed Variant Analysis

Use .tsv.gz for comprehensive variant annotation analysis

Machine Learning

Use .feather for ML workflows and statistical genetics

Review quality control

Check the statistics files to identify samples that may need exclusion from downstream analysis.

Access advanced results

Download ancestry and PRS results directly from the web interface, or convert using CLI tools.

File Format Details

Compression Formats

Block gzip (.tsv.gz): Decompress with bgzip, gzip, or pigz.

Arrow Feather (.feather): Native binary format, no decompression needed.

Loading the Dosage Matrix

Python

import pandas as pd
df = pd.read_feather('sample_vcf.dosage.feather')

library(arrow)
df <- read_feather('sample_vcf.dosage.feather')

Pro tips

Filter in the UI first before downloading for spreadsheet work — annotation files can be too large to open directly.

Check QC files to identify potential sample quality issues before running downstream analyses.

Keep the config file for reproducibility and method documentation.

Use CLI tools or Dashboard download for converting ancestry and PRS results to CSV format.

Next Steps

Explore field descriptions — Understand every annotation column in the output TSV.

Learn filtering techniques — Filter and subset your annotated variants for analysis.