Dealing with missing data in family-based association studies: a multiple imputation approach

Hum Hered. 2007;63(3-4):229-38. doi: 10.1159/000100481. Epub 2007 Mar 7.

Abstract

To test for association between a disease and a set of linked markers, or to estimate relative risks of disease, several different methods have been developed. Many methods for family data require that individuals be genotyped at the full set of markers and that phase can be reconstructed. Individuals with missing data are excluded from the analysis. This can result in an important decrease in sample size and a loss of information. A possible solution to this problem is to use missing-data likelihood methods. We propose an alternative approach, namely the use of multiple imputation. Briefly, this method consists in estimating from the available data all possible phased genotypes and their respective posterior probabilities. These posterior probabilities are then used to generate replicate imputed data sets via a data augmentation algorithm. We performed simulations to test the efficiency of this approach for case/parent trio data and we found that the multiple imputation procedure generally gave unbiased parameter estimates with correct type 1 error and confidence interval coverage. Multiple imputation had some advantages over missing data likelihood methods with regards to ease of use and model flexibility. Multiple imputation methods represent promising tools in the search for disease susceptibility variants.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Case-Control Studies
  • Data Interpretation, Statistical
  • Family
  • Genetic Predisposition to Disease*
  • Genotype*
  • Haplotypes
  • Humans
  • Models, Statistical
  • Odds Ratio
  • Regression Analysis