Add Sample Information
Link your genomic samples to subject-level information including phenotypes, demographics, and other covariates that may be important for your analysis. This step connects your sequence data to the clinical and experimental context.
When to Add Sample Information
Add sample information after submitting your genomic data for annotation but before running advanced analyses. This allows you to:
- ▶Search and filter results by clinical phenotypes
- ▶Group samples by case/control status or other categories
- ▶Account for confounding variables in statistical analyses
- ▶Link multiple samples from the same subject
Required File Format
Required Columns
Your covariate file must include these three columns with exact header names:
Experiment Name
Name of your study or experiment
Sample ID
Must match IDs in your genomic data
Subject ID
Links multiple samples to same individual
⚠️ Common Error
"Your file is missing one or more of the required headers: Experiment Name, Sample ID, Subject ID"
This error appears when any of the three required column headers are missing or misspelled. Check that your headers match exactly (case-sensitive).
Additional Covariate Columns
Beyond the three required columns, you can include any additional information relevant to your study:
Flexible System
Common examples include:
- ▶Clinical: Case/control status, age of onset, diagnosis, treatment response
- ▶Demographics: Sex, age at collection, ancestry, socioeconomic factors
- ▶Experimental: Collection date, site, sample type, processing batch
- ▶Study-specific: Environmental exposures, family relationships, quality metrics
💡 The system is flexible - include any variables that might be relevant for your analysis or could serve as potential confounders.
Example File Format
Sample CSV File:
Experiment Name,Sample ID,Subject ID,Sex,Age,Case_Control,Diagnosis,Collection_Date Cardiac_Study_2024,SAMPLE001,SUBJ001,Female,45,Case,Hypertrophic_Cardiomyopathy,2024-01-15 Cardiac_Study_2024,SAMPLE002,SUBJ002,Male,52,Control,Healthy,2024-01-16 Cardiac_Study_2024,SAMPLE003,SUBJ003,Female,38,Case,Dilated_Cardiomyopathy,2024-01-17 Cardiac_Study_2024,SAMPLE004,SUBJ001,Female,45,Case,Hypertrophic_Cardiomyopathy,2024-06-15
Note: Subject SUBJ001 has two samples (baseline and 6-month follow-up)
Upload Process
Prepare Your File
Create a CSV or TSV (tab-separated) file with the required columns and your covariate data.
- ▶Ensure Sample IDs exactly match those in your genomic data
- ▶Use consistent naming for categorical variables
- ▶Avoid spaces and special characters in column headers
Navigate to Upload
In the Bystro interface:
- ▶Look for "Add subject level info" section
- ▶Enter your experiment name in the text bar
- ▶Upload your prepared covariate file
Link to Existing Experiment (Optional)
If you want to search by de-anonymized subject IDs:
- ▶Select "Add New Experiment"
- ▶Choose an existing name mapping or upload a new template
Troubleshooting
Missing Required Headers
Double-check that your file includes exactly these column names:Experiment Name,Sample ID,Subject ID
Sample ID Mismatch
Sample IDs in your covariate file must exactly match those in your genomic data. Check for extra spaces, different capitalization, or missing samples.
File Format Issues
Save your file as CSV (comma-separated) or TSV (tab-separated) format. Excel files (.xlsx) are not currently supported. If using Excel to prepare your data, export as CSV before uploading.
Best Practices
- ✓Plan your covariates early: Think about what factors might affect your analysis before data collection
- ✓Use consistent coding: For categorical variables, use the same spelling and capitalization throughout
- ✓Document your variables: Keep a separate file explaining what each column represents
- ✓Handle missing data thoughtfully: Use consistent codes for missing values (e.g., "NA", "Unknown")
- ✓Validate before upload: Check that all Sample IDs in your covariate file exist in your genomic data
💡 Pro Tip
Start with a small test file containing a few samples to make sure the upload process works correctly before uploading your full dataset. This helps catch formatting issues early and saves time.
Next Steps
After successfully uploading your sample information, you can:
- ▶Filter and search your results using the uploaded covariates
- ▶Set up case/control comparisons for association testing
- ▶Download and analyze your results with integrated phenotype data