Estimation of genetic risk function with covariates in the presence of missing genotypes

Stat Med. 2017 Sep 30;36(22):3533-3546. doi: 10.1002/sim.7376. Epub 2017 Jun 27.

Abstract

In genetic epidemiological studies, family history data are collected on relatives of study participants and used to estimate the age-specific risk of disease for individuals who carry a causal mutation. However, a family member's genotype data may not be collected because of the high cost of in-person interview to obtain blood sample or death of a relative. Previously, efficient nonparametric genotype-specific risk estimation in censored mixture data has been proposed without considering covariates. With multiple predictive risk factors available, risk estimation requires a multivariate model to account for additional covariates that may affect disease risk simultaneously. Therefore, it is important to consider the role of covariates in genotype-specific distribution estimation using family history data. We propose an estimation method that permits more precise risk prediction by controlling for individual characteristics and incorporating interaction effects with missing genotypes in relatives, and thus, gene-gene interactions and gene-environment interactions can be handled within the framework of a single model. We examine performance of the proposed methods by simulations and apply them to estimate the age-specific cumulative risk of Parkinson's disease (PD) in carriers of the LRRK2 G2019S mutation using first-degree relatives who are at genetic risk for PD. The utility of estimated carrier risk is demonstrated through designing a future clinical trial under various assumptions. Such sample size estimation is seen in the Huntington's disease literature using the length of abnormal expansion of a CAG repeat in the HTT gene but is less common in the PD literature. Copyright © 2017 John Wiley & Sons, Ltd.

Keywords: Parkinson's disease; censored data; disease risk estimation; mixture distribution; penetrance function.

MeSH terms

  • Aged
  • Aged, 80 and over
  • Computer Simulation
  • Family
  • Female
  • Gene-Environment Interaction
  • Genetic Predisposition to Disease*
  • Genotype
  • Humans
  • Leucine-Rich Repeat Serine-Threonine Protein Kinase-2
  • Likelihood Functions
  • Male
  • Middle Aged
  • Models, Genetic*
  • Models, Statistical*
  • Mutation
  • Parkinson Disease / genetics
  • Penetrance
  • Proportional Hazards Models
  • Regression Analysis
  • Risk Assessment / methods*
  • Risk Factors

Substances

  • LRRK2 protein, human
  • Leucine-Rich Repeat Serine-Threonine Protein Kinase-2