Statistical modeling of large microarray data sets to identify stimulus-response profiles

Proc Natl Acad Sci U S A. 2001 May 8;98(10):5631-6. doi: 10.1073/pnas.101013198.

Abstract

A statistical modeling approach is proposed for use in searching large microarray data sets for genes that have a transcriptional response to a stimulus. The approach is unrestricted with respect to the timing, magnitude or duration of the response, or the overall abundance of the transcript. The statistical model makes an accommodation for systematic heterogeneity in expression levels. Corresponding data analyses provide gene-specific information, and the approach provides a means for evaluating the statistical significance of such information. To illustrate this strategy we have derived a model to depict the profile expected for a periodically transcribed gene and used it to look for budding yeast transcripts that adhere to this profile. Using objective criteria, this method identifies 81% of the known periodic transcripts and 1,088 genes, which show significant periodicity in at least one of the three data sets analyzed. However, only one-quarter of these genes show significant oscillations in at least two data sets and can be classified as periodic with high confidence. The method provides estimates of the mean activation and deactivation times, induced and basal expression levels, and statistical measures of the precision of these estimates for each periodic transcript.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • CDC28 Protein Kinase, S cerevisiae / genetics
  • Cell Cycle / genetics
  • Models, Statistical*
  • Oligonucleotide Array Sequence Analysis*
  • RNA, Messenger / genetics
  • Transcription, Genetic

Substances

  • RNA, Messenger
  • CDC28 Protein Kinase, S cerevisiae