A unified sparse representation for sequence variant identification for complex traits

Genet Epidemiol. 2014 Dec;38(8):671-9. doi: 10.1002/gepi.21849. Epub 2014 Sep 4.

Abstract

Joint adjustment of cryptic relatedness and population structure is necessary to reduce bias in DNA sequence analysis; however, existent sparse regression methods model these two confounders separately. Incorporating prior biological information has great potential to enhance statistical power but such information is often overlooked in many existent sparse regression models. We developed a unified sparse regression (USR) to incorporate prior information and jointly adjust for cryptic relatedness, population structure, and other environmental covariates. Our USR models cryptic relatedness as a random effect and population structure as fixed effect, and utilize the weighted penalties to incorporate prior knowledge. As demonstrated by extensive simulations, our USR algorithm can discover more true causal variants and maintain a lower false discovery rate than do several commonly used feature selection methods. It can handle both rare and common variants simultaneously. Applying our USR algorithm to DNA sequence data of Mexican Americans from GAW18, we replicated three hypertension pathways, demonstrating the effectiveness in identifying susceptibility genetic variants.

Keywords: Mexican Americans; population structure; prior biological information; relatedness; sparse regression.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Genetic Loci
  • Genetic Variation*
  • Genome-Wide Association Study
  • Humans
  • Models, Genetic
  • Regression Analysis
  • Sequence Analysis, DNA / methods*