Result Files Description

When your annotation completes, Bystro generates a comprehensive set of output files containing annotated variants, quality control metrics, ancestry analysis, polygenic risk scores, and machine learning-ready datasets. Here's what each file contains and how to use them.

Download package
Your results are delivered as a compressed tarball containing multiple files. Extract the archive to access individual components for downstream analysis.

Core Annotation Files

sample_vcf.annotation.tsv.gz

Main annotation output

Tab-separated file with comprehensive variant annotations, one row per variant with extensive genomic information.

Use for: Detailed variant analysis, filtering by annotation criteria, identifying functional variants, generating custom reports.

Format: Block gzipped TSV — decompress with bgzip, gzip, or pigz.

Large file warning
This file can be enormous for large cohorts (billions of variants across thousands of samples). Opening directly in Excel is not recommended. Use the Bystro web interface to filter and subset data, then download smaller filtered results for spreadsheet analysis.

sample_vcf.dosage.feather

Genotype dosage matrix

Machine learning-ready format with variant dosages (0, 1, 2) for each sample, optimized for polygenic risk score calculations.

Use for: Polygenic risk scores, GWAS, machine learning, statistical genetics analyses.

Format: Arrow Feather V2 — supported by Python pandas, polars, R, and Julia.

Structure: First column = chr:pos:ref:alt, remaining columns = per-sample dosages.

Quality Control & Metadata

Sample Information & Statistics

  • sample_vcf.sample_listList of included samples
  • sample_vcf.statistics.tsvSample QC statistics (TSV)
  • sample_vcf.statistics.jsonSample QC statistics (JSON)
  • sample_vcf.statistics.qc.tsvSamples failing QC (> 3 std dev)

Configuration & Documentation

  • hg19.yml / hg38.ymlAnnotation configuration file
  • sample_vcf.annotation.header.jsonColumn descriptions
  • sample_vcf.annotation.log.txtProcessing log file
  • bystro_annotation.completeCompletion marker

Performance & Cache Files

  • sample_vcf.dosage.feather.index.gzDosage matrix index
  • filtered_dosage_matricesCached filtered data

Advanced Analysis Results

Polygenic Risk Scores (PRS)

Risk predictions based on GWAS summary statistics. Each GWAS study produces a separate PRS file, e.g.:

  • sample_vcf.AD.30617256.prs.tsvAlzheimer's Disease GWAS
  • sample_vcf.AD.35379992.prs.tsvAlternative AD GWAS
  • sample_vcf.IBD.PMIDLIU.prs.tsvInflammatory Bowel Disease

Download CSV files directly from the web interface, or use CLI tools to convert.

Ancestry Analysis

Genetic ancestry inference using principal component analysis and reference populations.

ancestry_results.jsonPrimary output file

Download CSV directly from the web interface, or use the Bystro CLI to convert JSON results.

Variant Reports

Curated reports highlighting variants of clinical or research significance. Available for download through the Dashboard UI.

Working with Your Results

1

Extract the archive

Your results come as a compressed file ending in .tar.gz. Extract it to access the individual files.

Windows

Built-in (Windows 10/11): Right-click the file → "Extract All"

Alternative: Download 7-Zip (free) if built-in extraction doesn't work

Mac

Double-click the file — it will extract automatically

Linux / Command Line

tar -xzf sample_vcf_results.tar.gz
2

Choose your analysis path

Detailed Variant Analysis

Use .tsv.gz for comprehensive variant annotation analysis

Machine Learning

Use .feather for ML workflows and statistical genetics

3

Review quality control

Check the statistics files to identify samples that may need exclusion from downstream analysis.

4

Access advanced results

Download ancestry and PRS results directly from the web interface, or convert using CLI tools.

File Format Details

Compression Formats

Block gzip (.tsv.gz): Decompress with bgzip, gzip, or pigz.

Arrow Feather (.feather): Native binary format, no decompression needed.

Loading the Dosage Matrix

Python

import pandas as pd
df = pd.read_feather('sample_vcf.dosage.feather')

R

library(arrow)
df <- read_feather('sample_vcf.dosage.feather')
Pro tips

Filter in the UI first before downloading for spreadsheet work — annotation files can be too large to open directly.

Check QC files to identify potential sample quality issues before running downstream analyses.

Keep the config file for reproducibility and method documentation.

Use CLI tools or Dashboard download for converting ancestry and PRS results to CSV format.

Next Steps