Analyzing microarray data using cluster analysis

Pharmacogenomics. 2003 Jan;4(1):41-52. doi: 10.1517/phgs.4.1.41.22581.

Abstract

As pharmacogenetics researchers gather more detailed and complex data on gene polymorphisms that effect drug metabolizing enzymes, drug target receptors and drug transporters, they will need access to advanced statistical tools to mine that data. These tools include approaches from classical biostatistics, such as logistic regression or linear discriminant analysis, and supervised learning methods from computer science, such as support vector machines and artificial neural networks. In this review, we present an overview of another class of models, cluster analysis, which will likely be less familiar to pharmacogenetics researchers. Cluster analysis is used to analyze data that is not a priori known to contain any specific subgroups. The goal is to use the data itself to identify meaningful or informative subgroups. Specifically, we will focus on demonstrating the use of distance-based methods of hierarchical clustering to analyze gene expression data.

Publication types

  • Review

MeSH terms

  • Cluster Analysis*
  • Genes*
  • Oligonucleotide Array Sequence Analysis / methods*