Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models

BMC Bioinformatics. 2008 Jan 10:9:14. doi: 10.1186/1471-2105-9-14.

Abstract

Background: When predictive survival models are built from high-dimensional data, there are often additional covariates, such as clinical scores, that by all means have to be included into the final model. While there are several techniques for the fitting of sparse high-dimensional survival models by penalized parameter estimation, none allows for explicit consideration of such mandatory covariates.

Results: We introduce a new boosting algorithm for censored time-to-event data that shares the favorable properties of existing approaches, i.e., it results in sparse models with good prediction performance, but uses an offset-based update mechanism. The latter allows for tailored penalization of the covariates under consideration. Specifically, unpenalized mandatory covariates can be introduced. Microarray survival data from patients with diffuse large B-cell lymphoma, in combination with the recent, bootstrap-based prediction error curve technique, is used to illustrate the advantages of the new procedure.

Conclusion: It is demonstrated that it can be highly beneficial in terms of prediction performance to use an estimation procedure that incorporates mandatory covariates into high-dimensional survival models. The new approach also allows to answer the question whether improved predictions are obtained by including microarray features in addition to classical clinical criteria.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers, Tumor / analysis*
  • Computer Simulation
  • Gene Expression Profiling
  • Humans
  • Lymphoma, B-Cell / diagnosis
  • Lymphoma, B-Cell / metabolism*
  • Lymphoma, B-Cell / mortality*
  • Models, Biological
  • Neoplasm Proteins / analysis*
  • Prevalence
  • Proportional Hazards Models*
  • Reproducibility of Results
  • Risk Assessment / methods*
  • Risk Factors
  • Sensitivity and Specificity
  • Software
  • Survival Analysis*
  • Survival Rate

Substances

  • Biomarkers, Tumor
  • Neoplasm Proteins