Family-based association studies for next-generation sequencing

Yun Zhu; Momiao Xiong

doi:10.1016/j.ajhg.2012.04.022

Family-based association studies for next-generation sequencing

Am J Hum Genet. 2012 Jun 8;90(6):1028-45. doi: 10.1016/j.ajhg.2012.04.022.

Authors

Yun Zhu¹, Momiao Xiong

Affiliation

¹ Human Genetics Center and Division of Biostatistics, The University of Texas School of Public Health, Houston, 77030, USA.

Abstract

An individual's disease risk is determined by the compounded action of both common variants, inherited from remote ancestors, that segregated within the population and rare variants, inherited from recent ancestors, that segregated mainly within pedigrees. Next-generation sequencing (NGS) technologies generate high-dimensional data that allow a nearly complete evaluation of genetic variation. Despite their promise, NGS technologies also suffer from remarkable limitations: high error rates, enrichment of rare variants, and a large proportion of missing values, as well as the fact that most current analytical methods are designed for population-based association studies. To meet the analytical challenges raised by NGS, we propose a general framework for sequence-based association studies that can use various types of family and unrelated-individual data sampled from any population structure and a universal procedure that can transform any population-based association test statistic for use in family-based association tests. We develop family-based functional principal-component analysis (FPCA) with or without smoothing, a generalized T(2), combined multivariate and collapsing (CMC) method, and single-marker association test statistics. Through intensive simulations, we demonstrate that the family-based smoothed FPCA (SFPCA) has the correct type I error rates and much more power to detect association of (1) common variants, (2) rare variants, (3) both common and rare variants, and (4) variants with opposite directions of effect from other population-based or family-based association analysis methods. The proposed statistics are applied to two data sets with pedigree structures. The results show that the smoothed FPCA has a much smaller p value than other statistics.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Alleles
Asthma / genetics
Cardiovascular Diseases / genetics
Cohort Studies
Family Health
Genetic Variation
Genetics, Population
Genome-Wide Association Study
Genotype
Humans
Models, Genetic
Models, Statistical
Multivariate Analysis
Pedigree
Principal Component Analysis
Reproducibility of Results
Risk
Sequence Analysis, DNA / methods*

Abstract

Publication types

MeSH terms

Grants and funding