Analyzing microarray data using cluster analysis

William Shannon; Robert Culverhouse; Jill Duncan

doi:10.1517/phgs.4.1.41.22581

Analyzing microarray data using cluster analysis

Pharmacogenomics. 2003 Jan;4(1):41-52. doi: 10.1517/phgs.4.1.41.22581.

Authors

William Shannon¹, Robert Culverhouse, Jill Duncan

Affiliation

¹ Department of Medicine, Washington University School of Medicine, 660 S. Euclid Avenue, Campus Box 8005, St. Louis, MO 63110, USA. shannon@ilya.wustl.edu

PMID: 12517285
DOI: 10.1517/phgs.4.1.41.22581

Abstract

As pharmacogenetics researchers gather more detailed and complex data on gene polymorphisms that effect drug metabolizing enzymes, drug target receptors and drug transporters, they will need access to advanced statistical tools to mine that data. These tools include approaches from classical biostatistics, such as logistic regression or linear discriminant analysis, and supervised learning methods from computer science, such as support vector machines and artificial neural networks. In this review, we present an overview of another class of models, cluster analysis, which will likely be less familiar to pharmacogenetics researchers. Cluster analysis is used to analyze data that is not a priori known to contain any specific subgroups. The goal is to use the data itself to identify meaningful or informative subgroups. Specifically, we will focus on demonstrating the use of distance-based methods of hierarchical clustering to analyze gene expression data.

Publication types

Review

MeSH terms

Cluster Analysis*
Genes*
Oligonucleotide Array Sequence Analysis / methods*