Pathway analysis with next-generation sequencing data

Eur J Hum Genet. 2015 Apr;23(4):507-15. doi: 10.1038/ejhg.2014.121. Epub 2014 Jul 2.

Abstract

Although pathway analysis methods have been developed and successfully applied to association studies of common variants, the statistical methods for pathway-based association analysis of rare variants have not been well developed. Many investigators observed highly inflated false-positive rates and low power in pathway-based tests of association of rare variants. The inflated false-positive rates and low true-positive rates of the current methods are mainly due to their lack of ability to account for gametic phase disequilibrium. To overcome these serious limitations, we develop a novel statistic that is based on the smoothed functional principal component analysis (SFPCA) for pathway association tests with next-generation sequencing data. The developed statistic has the ability to capture position-level variant information and account for gametic phase disequilibrium. By intensive simulations, we demonstrate that the SFPCA-based statistic for testing pathway association with either rare or common or both rare and common variants has the correct type 1 error rates. Also the power of the SFPCA-based statistic and 22 additional existing statistics are evaluated. We found that the SFPCA-based statistic has a much higher power than other existing statistics in all the scenarios considered. To further evaluate its performance, the SFPCA-based statistic is applied to pathway analysis of exome sequencing data in the early-onset myocardial infarction (EOMI) project. We identify three pathways significantly associated with EOMI after the Bonferroni correction. In addition, our preliminary results show that the SFPCA-based statistic has much smaller P-values to identify pathway association than other existing methods.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Black or African American / genetics
  • Case-Control Studies
  • Computer Simulation
  • Databases, Genetic
  • Exome
  • Gene Frequency
  • Genetic Association Studies
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Models, Genetic
  • Myocardial Infarction / diagnosis*
  • Myocardial Infarction / genetics*
  • Polymorphism, Single Nucleotide
  • Principal Component Analysis
  • Sequence Analysis, DNA
  • Signal Transduction
  • Transforming Growth Factor beta / genetics
  • Transforming Growth Factor beta / metabolism
  • White People / genetics

Substances

  • Transforming Growth Factor beta