An ensemble biclustering approach for querying gene expression compendia with experimental lists

Riet De Smet; Kathleen Marchal

doi:10.1093/bioinformatics/btr307

An ensemble biclustering approach for querying gene expression compendia with experimental lists

Bioinformatics. 2011 Jul 15;27(14):1948-56. doi: 10.1093/bioinformatics/btr307. Epub 2011 May 18.

Authors

Riet De Smet¹, Kathleen Marchal

Affiliation

¹ Department of Plant Systems Biology, VIB, Ghent University, Technologiepark 927, Ghent, Belgium.

PMID: 21593133
DOI: 10.1093/bioinformatics/btr307

Abstract

Motivation: Query-based biclustering techniques allow interrogating a gene expression compendium with a given gene or gene list. They do so by searching for genes in the compendium that have a profile close to the average expression profile of the genes in this query-list. As it can often not be guaranteed that the genes in a long query-list will all be mutually coexpressed, it is advisable to use each gene separately as a query. This approach, however, leaves the user with a tedious post-processing of partially redundant biclustering results. The fact that for each query-gene multiple parameter settings need to be tested in order to detect the 'most optimal bicluster size' adds to the redundancy problem.

Results: To aid with this post-processing, we developed an ensemble approach to be used in combination with query-based biclustering. The method relies on a specifically designed consensus matrix in which the biclustering outcomes for multiple query-genes and for different possible parameter settings are merged in a statistically robust way. Clustering of this matrix results in distinct, non-redundant consensus biclusters that maximally reflect the information contained within the original query-based biclustering results. The usefulness of the developed approach is illustrated on a biological case study in Escherichia coli.

Availability and implementation: Compiled Matlab code is available from http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Information_DeSmet_2011/.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Cluster Analysis
Escherichia coli / genetics
Gene Expression
Gene Expression Profiling / methods*
Oligonucleotide Array Sequence Analysis / methods