Result Files Description

When your annotation completes, Bystro generates a comprehensive set of output files containing annotated variants, quality control metrics, ancestry analysis, polygenic risk scores, andmachine learning-ready datasets. Here's what each file contains and how to use them.

Download Package

Your results are delivered as a compressed tarball containing multiple files. Extract the archive to access individual components for downstream analysis.

Core Annotation Files

sample_vcf.annotation.tsv.gz

Main annotation output - Tab-separated file with comprehensive variant annotations, one row per variant with extensive genomic information.

Use for: Detailed variant analysis, filtering by annotation criteria, identifying functional variants, generating custom reports

Format: Block gzipped TSV (decompress with bgzip, gzip, or pigz)

Large File Warning

This file can be enormous (billions of variants for thousands of samples). For large cohorts, opening directly in Excel is not recommended. Use the Bystro web interface to filter and subset data, then download smaller filtered results for any spreadsheet analysis.

sample_vcf.dosage.feather

Genotype dosage matrix - Machine learning-ready format with variant dosages (0, 1, 2) for each sample, optimized for polygenic risk score calculations.

Use for: Polygenic risk scores, GWAS, machine learning, statistical genetics analyses

Format: Arrow Feather V2 (supported by Python pandas, polars, R, Julia)

Structure: First column = chr:pos:ref:alt, remaining columns = sample dosages

Quality Control & Metadata

Sample Information & Statistics

sample_vcf.sample_list - List of included samples
sample_vcf.statistics.tsv - Sample QC statistics (TSV format)
sample_vcf.statistics.json - Sample QC statistics (JSON format)
sample_vcf.statistics.qc.tsv - Failed QC samples (> 3 std dev)

Configuration & Documentation

hg19.yml / hg38.yml - Annotation configuration file
sample_vcf.annotation.header.json - Column descriptions
sample_vcf.annotation.log.txt - Processing log file
bystro_annotation.complete - Completion marker

Performance & Cache Files

sample_vcf.dosage.feather.index.gz - Dosage matrix index
filtered_dosage_matrices - Cached filtered data

Advanced Analysis Results

Polygenic Risk Scores (PRS)

Risk predictions based on GWAS summary statistics. Each GWAS produces a separate PRS file.

Example files:
• sample_vcf.AD.30617256.prs.tsv (Alzheimer's Disease GWAS)
• sample_vcf.AD.35379992.prs.tsv (Different AD GWAS)
• sample_vcf.IBD.PMIDLIU.prs.tsv (Inflammatory Bowel Disease)

Individual Scores

Per-sample risk predictions for each trait

UI Download

Download CSV files directly from web interface

Ancestry Analysis

Genetic ancestry inference using principal component analysis and reference populations.

File: ancestry_results.json

CLI Access

Use Bystro CLI to convert JSON results to CSV format

UI Download

Download CSV directly from the web interface

Variant Reports

Curated reports highlighting variants of clinical or research significance.

Access: Available for download through Dashboard UI interface

Working with Your Results

Step 1: Extract the Archive

Your results come as a compressed file (ending in .tar.gz). You'll need to extract it to access the individual files.

💻 Windows

Built-in (Windows 10/11): Right-click the file → "Extract All"

Alternative: Download 7-Zip (free) if built-in extraction doesn't work

🍎 Mac

Built-in: Double-click the file - it will extract automatically

🐧 Linux / Command Line

tar -xzf sample_vcf_results.tar.gz

Step 2: Choose Your Analysis Path

Detailed Variant Analysis

Use .tsv.gz file for comprehensive variant annotation analysis

Machine Learning

Use .feather file for ML workflows and statistical genetics

Step 3: Quality Control Review

Check statistics files to identify samples that may need exclusion from downstream analysis.

Step 4: Access Advanced Results

Download ancestry and PRS results directly from the web interface or convert using CLI tools.

File Format Details

Compression Formats

  • Block gzip (.tsv.gz): Decompress with bgzip, gzip, or pigz
  • Arrow Feather (.feather): Native binary format, no decompression needed

Data Loading Examples

Python (Pandas)

import pandas as pd
df = pd.read_feather('sample_vcf.dosage.feather')

R

library(arrow)
df <- read_feather('sample_vcf.dosage.feather')

Pro Tips

  • Filter in the UI first before downloading for Excel exploration - annotation files can be too large for spreadsheets
  • Check QC files to identify potential sample quality issues
  • Keep the config file for reproducibility and method documentation
  • Use CLI tools or Dashboard download for converting ancestry and PRS results to CSV format

Next Steps

After understanding your result files: