OSS Algorithms
At Bystro, we believe natural language is the right interface for genetic and proteomic analysis. We are building the world's first LLM-powered natural language analysis engine that takes your questions about complex genetic and proteomic datasets and converts them into statistical answers with easy-to-understand summaries and visualizations.
This is our open-source collection of machine learning methods for high-dimensional statistics, with applications in genomics and proteomics. We're working to integrate these methods into the Bystro natural language analysis platform. Our current platform automates analyses like PRS, ancestry calculation, and QC for genetics data.
Installation
Install the Bystro Python package:
pip install bystroMachine Learning Methods
Covariance Matrix Estimation and Hypothesis Testing
from bystro.covariance import *Regularized covariance matrix estimation methods well suited for smaller sample size regimes where n << p.
Covariance matrix hypothesis tests, including the two-sample covariance test from bystro.random_matrix_theory.rmt4ds_cov_test import two_sample_cov_test.
Random Matrix Theory Methods
from bystro.random_matrix_theory import *Foundational modules for significance tests, including two_sample_cov_test.
Stochastic Gradient Langevin
from bystro.stochastic_gradient_langevin import *Implementation of the Stochastic Gradient Langevin algorithm. Read the paper →
Fair ML / Supervised PPCA / Variational Principal Component Regression
from bystro.supervised_ppca import *supervised_ppca is a collection of generative methods:
Applications in Proteomics
Four modular steps that can be applied alone or combined:
Applications in Genetics
Make genetic results more generalizable by removing information from confounding factors:
Remove ancestry-related information in multi-ancestry cohorts to reduce bias.
Remove the effect of batch in meta-analyses.
See the Fair PCA demo for a worked example. Fair PCA Demo →
Combined (Multi-omics) Applications
Combine genomic and proteomic data for downstream analyses or data exploration. Read the proteomics README →
Publications
If you use the Bystro Python package in your research, please cite:
Kotlar et al. "Bystro: rapid online variant annotation and natural-language filtering at whole-genome scale." Genome Biology 19, 14 (2018). https://doi.org/10.1186/s13059-018-1387-3