Family-based association studies for next-generation sequencing

Am J Hum Genet. 2012 Jun 8;90(6):1028-45. doi: 10.1016/j.ajhg.2012.04.022.

Abstract

An individual's disease risk is determined by the compounded action of both common variants, inherited from remote ancestors, that segregated within the population and rare variants, inherited from recent ancestors, that segregated mainly within pedigrees. Next-generation sequencing (NGS) technologies generate high-dimensional data that allow a nearly complete evaluation of genetic variation. Despite their promise, NGS technologies also suffer from remarkable limitations: high error rates, enrichment of rare variants, and a large proportion of missing values, as well as the fact that most current analytical methods are designed for population-based association studies. To meet the analytical challenges raised by NGS, we propose a general framework for sequence-based association studies that can use various types of family and unrelated-individual data sampled from any population structure and a universal procedure that can transform any population-based association test statistic for use in family-based association tests. We develop family-based functional principal-component analysis (FPCA) with or without smoothing, a generalized T(2), combined multivariate and collapsing (CMC) method, and single-marker association test statistics. Through intensive simulations, we demonstrate that the family-based smoothed FPCA (SFPCA) has the correct type I error rates and much more power to detect association of (1) common variants, (2) rare variants, (3) both common and rare variants, and (4) variants with opposite directions of effect from other population-based or family-based association analysis methods. The proposed statistics are applied to two data sets with pedigree structures. The results show that the smoothed FPCA has a much smaller p value than other statistics.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Alleles
  • Asthma / genetics
  • Cardiovascular Diseases / genetics
  • Cohort Studies
  • Family Health
  • Genetic Variation
  • Genetics, Population
  • Genome-Wide Association Study
  • Genotype
  • Humans
  • Models, Genetic
  • Models, Statistical
  • Multivariate Analysis
  • Pedigree
  • Principal Component Analysis
  • Reproducibility of Results
  • Risk
  • Sequence Analysis, DNA / methods*