Query CADD Scores

CADD (Combined Annotation Dependent Depletion) provides genome-wide predictions of variant deleteriousness. Use CADD scores to prioritize potentially harmful variants in your analysis.

About CADD Scores

CADD integrates multiple genomic annotations to predict the deleteriousness of genetic variants. Higher scores indicate variants more likely to have deleterious effects.

Learn more in the original CADD publication.

CADD Query Syntax

1

Type the field name

Start with cadd to specify the CADD score field you want to search.

2

Add a mathematical operator

Use operators like =,>,<,>=,<=, or a colon:

  • cadd: (with colon) is equivalent to cadd =
  • This applies to any numerical field like pos, phyloP, and phastCons
3

Specify the score threshold

Add the numerical value you want to filter by. CADD scores typically range from 0 to 40+, with higher scores indicating more deleterious variants.

Common CADD Score Thresholds

Moderate Impact

CADD ≥ 10

Top 10% most deleterious variants. Good starting point for variant prioritization.

High Impact

CADD ≥ 20

Top 1% most deleterious variants. Commonly used threshold for pathogenic variant screening.

Very High Impact

CADD ≥ 30

Top 0.1% most deleterious variants. Used for identifying likely pathogenic mutations.

CADD Query Examples

Example 1: High-Impact Variants

Search for variants with CADD scores greater than 20 usingcadd > 20:

Animation showing how to search for CADD scores greater than 20

Finding high-impact variants with CADD scores above 20 (top 1% most deleterious)

Example 2: Inclusive Threshold

Include variants with exactly 20 usingcadd >= 20:

Animation showing how to search for CADD scores greater than or equal to 20

Using >= operator to include variants with CADD scores of exactly 20

Example 3: Score Range Query

Search for variants within a specific CADD range usingcadd:[15 TO 20]:

Animation showing how to search for CADD scores in a range

Finding variants with moderate CADD scores between 15 and 20

CADD Scores and Indels

Important Considerations

CADD scores are originally defined for SNPs only. For indels, Bystro provides CADD scores for all affected reference positions, giving you comprehensive deleteriousness predictions.

Single Base Indels

Bystro provides all 3 possible CADD scores for the affected position, assuming the indel could be as significant as the most deleterious SNP at that site.

Longer Indels

  • Deletions: First 32 covered bases annotated (separated by "|")
  • Insertions: Both flanking reference positions annotated

Indel Query Behavior

When querying cadd > 20, an indel with CADD scores of 0, 10, and 25 will match because one of its scores (25) exceeds the threshold.

This behavior applies to any field containing multiple values separated by "|".

Practical Applications

Clinical Variant Prioritization

  • • Use cadd >= 20 for initial pathogenic variant screening
  • • Combine with allele frequency filters for rare, high-impact variants
  • • Focus on coding regions with high CADD scores

Research Applications

  • • Compare CADD distributions between case and control groups
  • • Identify variants likely to disrupt protein function
  • • Prioritize variants for functional validation studies

CADD Score Best Practices

Combine with other filters: Use CADD scores alongside allele frequency and functional annotations

Consider context: High CADD scores in non-coding regions may indicate regulatory disruption

Validate findings: CADD is predictive - validate high-scoring variants experimentally when possible

Population differences: Consider population-specific deleteriousness patterns in diverse cohorts

Performance Note

Dataset used in examples: 1000 Genomes Project (73,452,337 variants in 27,192 genes, queries typically complete in ~0.5 seconds)