Using large sequencing data sets to refine intragenic disease regions and prioritize clinical variant interpretation

Genet Med. 2017 May;19(5):496-504. doi: 10.1038/gim.2016.134. Epub 2016 Sep 22.

Abstract

Purpose: Classification of novel variants is a major challenge facing the widespread adoption of comprehensive clinical genomic sequencing and the field of personalized medicine in general. This is largely because most novel variants do not have functional, genetic, or population data to support their clinical classification.

Methods: To improve variant interpretation, we leveraged the Exome Aggregation Consortium (ExAC) data set (N = ~60,000) as well as 7,000 clinically curated variants in 132 genes identified in more than 11,000 probands clinically tested for cardiomyopathies, rasopathies, hearing loss, or connective tissue disorders to perform a systematic evaluation of domain level disease associations.

Results: We statistically identify regions that are most sensitive to functional variation in the general population and also most commonly impacted in symptomatic individuals. Our data show that a significant number of exons and domains in genes strongly associated with disease can be defined as disease-sensitive or disease-tolerant, leading to potential reclassification of at least 26% (450 out of 1,742) of variants of uncertain clinical significance in the 132 genes.

Conclusion: This approach leverages domain functional annotation and associated disease in each gene to prioritize candidate disease variants, increasing the sensitivity and specificity of novel variant assessment within these genes.Genet Med advance online publication 22 September 2016.

MeSH terms

  • Cardiomyopathies / genetics
  • Connective Tissue Diseases / genetics
  • Databases, Genetic
  • Genetic Association Studies
  • Genetic Predisposition to Disease*
  • Genetic Variation*
  • Hearing Loss / genetics
  • Humans
  • Sequence Analysis, DNA / methods*