Whole genome-wide association study using affymetrix SNP chip: a two-stage sequential selection method to identify genes that increase the risk of developing complex diseases

Methods Mol Med. 2008:141:23-35. doi: 10.1007/978-1-60327-148-6_2.

Abstract

Whole-genome association studies of complex diseases hold great promise to identify systematically genetic loci that influence one's risk of developing these diseases. However, the polygenic nature of the complex diseases and genetic interactions among the genes pose significant challenge in both experimental design and data analysis. High-density genotype data make it possible to identify most of the genetic loci that may be involved in the etiology. On the other hand, utilizing large number of statistic tests could lead to false positives if the tests are not adequately adjusted. In this paper, we discuss a two-stage method that sequentially applies a generalized linear model (GLM) and principal components analysis (PCA) to identify genetic loci that jointly determine the likelihood of developing disease. The method was applied to a pilot case-control study of esophageal squamous cell carcinoma (ESCC) that included 50 ESCC patients and 50 neighborhood-matched controls. Genotype data were determined by using the Affymetrix 10K SNP chip. We will discuss some of the special considerations that are important to the proper interpretation of whole genome-wide association studies, which include multiple comparisons, epistatic interaction among multiple genetic loci, and generalization of predictive models.

Publication types

  • Research Support, N.I.H., Intramural

MeSH terms

  • DNA Mutational Analysis
  • Gene Expression Profiling
  • Genes / physiology
  • Genetic Linkage*
  • Genetic Predisposition to Disease*
  • Genome, Human*
  • Humans
  • Oligonucleotide Array Sequence Analysis*
  • Polymorphism, Single Nucleotide*
  • Research Design
  • Risk Factors