What's the best statistic for a simple test of genetic association in a case-control study?

Genet Epidemiol. 2010 Apr;34(3):246-53. doi: 10.1002/gepi.20455.

Abstract

Genome-wide genetic association studies typically start with univariate statistical tests of each marker. In principle, this single-SNP scanning is statistically straightforward--the testing is done with standard methods (e.g. chi(2) tests, regression) that have been well studied for decades. However, a number of different tests and testing procedures can be used. In a case-control study, one can use a 1 df allele-based test, a 1 or 2 df genotype-based test, or a compound procedure that combines two or more of these statistics. Additionally, most of the tests can be performed with or without covariates included in the model. While there are a number of statistical papers that make power comparisons among subsets of these methods, none has comprehensively tackled the question of which of the methods in common use is best suited to univariate scanning in a genome-wide association study. In this paper, we consider a wide variety of realistic test procedures, and first compare the power of the different procedures to detect a single locus under different genetic models. We then address the question of whether or when it is a good idea to include covariates in the analysis. We conclude that the most commonly used approach to handle covariates--modeling covariate main effects but not interactions--is almost never a good idea. Finally, we consider the performance of the statistics in a genome scan context.

MeSH terms

  • Alleles
  • Case-Control Studies*
  • Data Interpretation, Statistical*
  • Genetic Association Studies
  • Genetic Predisposition to Disease
  • Genome-Wide Association Study
  • Genotype
  • Heterozygote
  • Humans
  • Models, Genetic
  • Models, Statistical
  • Regression Analysis