A comparison of approaches to account for uncertainty in analysis of imputed genotypes

Genet Epidemiol. 2011 Feb;35(2):102-10. doi: 10.1002/gepi.20552.

Abstract

The availability of extensively genotyped reference samples, such as "The HapMap" and 1,000 Genomes Project reference panels, together with advances in statistical methodology, have allowed for the imputation of genotypes at single nucleotide polymorphism (SNP) markers that are untyped in a cohort or case-control study. These imputation procedures facilitate the interpretation and meta-analyses of genome-wide association studies. A natural question when implementing these procedures concerns how best to take into account uncertainty in imputed genotypes. Here we compare the performance of the following three strategies: least-squares regression on the "best-guess" imputed genotype; regression on the expected genotype score or "dosage"; and mixture regression models that more fully incorporate posterior probabilities of genotypes at untyped SNPs. Using simulation, we considered a range of sample sizes, minor allele frequencies, and imputation accuracies to compare the performance of the different methods under various genetic models. The mixture models performed the best in the setting of a large genetic effect and low imputation accuracies. However, for most realistic settings, we find that regressing the phenotype on the estimated allelic or genotypic dosage provides an attractive compromise between accuracy and computational tractability.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cohort Studies
  • Computer Simulation
  • Data Interpretation, Statistical
  • Genome-Wide Association Study / methods*
  • Genotype*
  • Humans
  • Models, Statistical
  • Molecular Epidemiology / methods*
  • Phenotype
  • Regression Analysis
  • Reproducibility of Results
  • Software
  • Uncertainty