A general, prediction error-based criterion for selecting model complexity for high-dimensional survival models

Stat Med. 2010 Mar 30;29(7-8):830-8. doi: 10.1002/sim.3765.

Abstract

When fitting predictive survival models to high-dimensional data, an adequate criterion for selecting model complexity is needed to avoid overfitting. The complexity parameter is typically selected by the predictive partial log-likelihood (PLL) estimated via cross-validation. As an alternative criterion, we propose a relative version of the integrated prediction error curve (IPEC), which can be stably estimated via bootstrap resampling. The IPEC has the advantage of being applicable for models and fitting techniques where the PLL is not available. To investigate the performance of this new criterion, a simulation study is carried out, mimicking microarray survival data. Additionally, model selection by predictive PLL, estimated via bootstrap resampling instead of cross-validation, is examined. It is seen that this mostly results in similar prediction performance of the selected models, compared to estimates based on cross-validation. Model selection by bootstrap estimates of the IPEC performs about as well as selection by cross-validation estimates of the PLL. Therefore, it is expected to be a reasonable alternative in cases where there is no PLL. Similar results are seen in the analysis of a microarray survival data set from patients with diffuse large-B-cell lymphoma.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bias
  • Biomarkers, Tumor / analysis
  • Biostatistics*
  • Computer Simulation / statistics & numerical data
  • Humans
  • Likelihood Functions
  • Lymphoma, Large B-Cell, Diffuse / mortality
  • Models, Statistical*
  • Multivariate Analysis
  • Survival Analysis*

Substances

  • Biomarkers, Tumor