Boosting the concordance index for survival data--a unified framework to derive and evaluate biomarker combinations

PLoS One. 2014 Jan 6;9(1):e84483. doi: 10.1371/journal.pone.0084483. eCollection 2014.

Abstract

The development of molecular signatures for the prediction of time-to-event outcomes is a methodologically challenging task in bioinformatics and biostatistics. Although there are numerous approaches for the derivation of marker combinations and their evaluation, the underlying methodology often suffers from the problem that different optimization criteria are mixed during the feature selection, estimation and evaluation steps. This might result in marker combinations that are suboptimal regarding the evaluation criterion of interest. To address this issue, we propose a unified framework to derive and evaluate biomarker combinations. Our approach is based on the concordance index for time-to-event data, which is a non-parametric measure to quantify the discriminatory power of a prediction rule. Specifically, we propose a gradient boosting algorithm that results in linear biomarker combinations that are optimal with respect to a smoothed version of the concordance index. We investigate the performance of our algorithm in a large-scale simulation study and in two molecular data sets for the prediction of survival in breast cancer patients. Our numerical results show that the new approach is not only methodologically sound but can also lead to a higher discriminatory power than traditional approaches for the derivation of gene signatures.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Biomarkers*
  • Breast Neoplasms / genetics
  • Breast Neoplasms / mortality
  • Breast Neoplasms / pathology
  • Computational Biology / methods
  • Computer Simulation
  • Female
  • Gene Expression Profiling
  • Humans
  • Models, Biological*
  • Models, Statistical*
  • Neoplasm Metastasis
  • Prognosis
  • Survival Analysis*

Substances

  • Biomarkers

Grants and funding

The work of Andreas Mayr and Matthias Schmid was supported by Deutsche Forschungsgemeinschaft (DFG) (www.dfg.de), grant SCHM 2966/1-1. The authors further acknowledge support by Deutsche Forschungsgemeinschaft and Friedrich-Alexander-Universität Erlangen-Nürnberg within the funding programme Open Access Publishing. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.