Risk Prediction Modeling on Family-Based Sequencing Data Using a Random Field Method

Genetics. 2017 Sep;207(1):63-73. doi: 10.1534/genetics.117.199752. Epub 2017 Jul 5.

Abstract

Family-based design is one of the most popular designs in genetic studies and has many unique features for risk-prediction research. It is robust against genetic heterogeneity, and the relatedness among family members can be informative for predicting an individual's risk for disease with polygenic and shared environmental components of risk. Despite these strengths, family-based designs have been used infrequently in current risk-prediction studies, and their related statistical methods have not been well developed. In this article, we developed a generalized random field (GRF) method for family-based risk-prediction modeling on sequencing data. In GRF, subjects' phenotypes are viewed as stochastic realizations of a random field in a space, and a subject's phenotype is predicted by adjacent subjects, where adjacencies between subjects are determined by their genetic and within-family similarities. Different from existing methods that adjust for familial correlations, the GRF uses this information to form surrogates to further improve prediction accuracy. It also uses within-family information to capture predictors (e.g., rare mutations) that are homogeneous in families. Through simulations, we have demonstrated that the GRF method attained better performance than an existing method by considering additional information from family members and accounting for genetic heterogeneity. We further provided practical recommendations for designing family-based risk prediction studies. Finally, we illustrated the GRF method with an application to a whole-genome exome data set from the Michigan State University Twin Registry study.

Keywords: family-based studies; genetic heterogeneity; high-dimensional data; random field theory.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural

MeSH terms

  • Genetic Heterogeneity
  • Genetic Predisposition to Disease*
  • Genome-Wide Association Study / methods*
  • Humans
  • Models, Genetic*
  • Pedigree*
  • Sequence Analysis, DNA / methods