Set association analysis of SNP case-control and microarray data

J Comput Biol. 2003;10(3-4):569-74. doi: 10.1089/10665270360688192.

Abstract

Common heritable diseases ("complex traits") are assumed to be due to multiple underlying susceptibility genes. While genetic mapping methods for Mendelian disorders have been very successful, the search for genes underlying complex traits has been difficult and often disappointing. One of the reasons may be that most current gene-mapping approaches are still based on conventional methodology of testing one or a few SNPs at a time. Here, we demonstrate a simple strategy that allows for the joint analysis of multiple disease-associated SNPs in different genomic regions. Our set-association method combines information over SNPs by forming sums of relevant single-marker statistics. As previously hypothesized, we show here that this approach successfully addresses the "curse of dimensionality" problem--too many variables should be estimated with a comparatively small number of observations. We also report results of simulation studies showing that our method furnishes unbiased and accurate significance levels. Power calculations demonstrate good power even in the presence of large numbers of nondisease associated SNPs. We extended our method to microarray expression data, where expression levels for large numbers of genes should be compared between two tissue types. In applications to such data, our approach turned out to be highly efficient.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Case-Control Studies*
  • Computational Biology / methods*
  • Data Interpretation, Statistical*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Polymorphism, Single Nucleotide*