Multiscale and Bayesian approaches to data analysis in genomics high-throughput screening

Curr Opin Drug Discov Devel. 2002 May;5(3):428-38.

Abstract

Tremendous amounts of data are produced by high-throughput screening methods currently employed in drug discovery and product development. A typical cDNA microarray or oligonucleotide-based gene chip experiment easily generates over 10,000 data points for each array or chip. The challenge of inferring meaningful information is formidable given the size and number of these datasets. This paper reviews the current status of statistical tools available for gene expression analysis, with emphasis on Bayesian approaches and multiscale wavelet filtering. Fundamental concepts of Bayesian and multiscale modeling are discussed from the perspective of their potential to address important issues related to the analysis of gene expression data, such as the fact that genomic data often have non-Gaussian distributions and feature localization and multiple scales in both frequency and measurement dimension. Recent publications in these areas are reviewed. Wavelet filtering and the advantages of multiscale methods are demonstrated by application to publicly available gene expression data from the National Cancer Institute (NCI). Multiscale methods, including multiscale principal component analysis (MSPCA), are applied to extract gene subsets and to visualize data in multidimensions for comparisons. Similarity in cell lines and gene selection are effectively visualized and quantitatively compared.

Publication types

  • Review

MeSH terms

  • Animals
  • Bayes Theorem*
  • Combinatorial Chemistry Techniques / methods*
  • Combinatorial Chemistry Techniques / trends
  • Drug Design*
  • Genomics / methods*
  • Genomics / trends
  • Humans