Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data

Bioinformatics. 2006 May 15;22(10):1259-68. doi: 10.1093/bioinformatics/btl065. Epub 2006 Feb 24.

Abstract

Motivation: Because co-expressed genes are likely to share the same biological function, cluster analysis of gene expression profiles has been applied for gene function discovery. Most existing clustering methods ignore known gene functions in the process of clustering.

Results: To take advantage of accumulating gene functional annotations, we propose incorporating known gene functions into a new distance metric, which shrinks a gene expression-based distance towards 0 if and only if the two genes share a common gene function. A two-step procedure is used. First, the shrinkage distance metric is used in any distance-based clustering method, e.g. K-medoids or hierarchical clustering, to cluster the genes with known functions. Second, while keeping the clustering results from the first step for the genes with known functions, the expression-based distance metric is used to cluster the remaining genes of unknown function, assigning each of them to either one of the clusters obtained in the first step or some new clusters. A simulation study and an application to gene function prediction for the yeast demonstrate the advantage of our proposal over the standard method.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Biology / methods*
  • Cluster Analysis
  • Databases, Genetic
  • Expert Systems*
  • Gene Expression Profiling / methods*
  • Knowledge Bases*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Pattern Recognition, Automated / methods*