Derivation of stable microarray cancer-differentiating signatures using consensus scoring of multiple random sampling and gene-ranking consistency evaluation

Cancer Res. 2007 Oct 15;67(20):9996-10003. doi: 10.1158/0008-5472.CAN-07-1601.

Abstract

Microarrays have been explored for deriving molecular signatures to determine disease outcomes, mechanisms, targets, and treatment strategies. Although exhibiting good predictive performance, some derived signatures are unstable due to noises arising from measurement variability and biological differences. Improvements in measurement, annotation, and signature selection methods have been proposed. We explored a new signature selection method that incorporates consensus scoring of multiple random sampling and multistep evaluation of gene-ranking consistency for maximally avoiding erroneous elimination of predictor genes. This method was tested by using a well-studied 62-sample colon cancer data set and two other cancer data sets (86-sample lung adenocarcinoma and 60-sample hepatocellular carcinoma). For the colon cancer data set, the derived signatures of 20 sampling sets, composed of 10,000 training test sets, are fairly stable with 80% of top 50 and 69% to 93% of all predictor genes shared by all 20 signatures. These shared predictor genes include 48 cancer-related and 16 cancer-implicated genes, as well as 50% of the previously derived predictor genes. The derived signatures outperform all previously derived signatures in predicting colon cancer outcomes from an independent data set collected from the Stanford Microarray Database. Our method showed similar performance for the other two data sets, suggesting its usefulness in deriving stable signatures for biomarker and target discovery.

MeSH terms

  • Biometry / methods
  • Carcinoma, Hepatocellular / genetics
  • Colonic Neoplasms / genetics*
  • Gene Expression Profiling / methods*
  • Humans
  • Liver Neoplasms / genetics
  • Lung Neoplasms / genetics
  • Oligonucleotide Array Sequence Analysis / methods*
  • Pattern Recognition, Automated / methods
  • Reproducibility of Results