Accurate risk prediction models play a key role in precision medicine, where optimal individualized disease prevention and treatment strategies can be formed based on predicted risks. In many clinical settings, it is of great interest to predict the -year risk of developing a clinical event using baseline covariates. Such -year risk models can be estimated by fitting standard survival models, including the Cox proportional hazards model and the more flexible -year specific generalized linear model (-GLM). However, an efficient and robust estimation of the risk model is challenging under heavy censoring and potential model misspecification. Intermediate outcomes observed prior to censoring can be highly predictive of the outcome and, thus, may be used to improve the efficiency of the model estimation. However, existing augmentation methods either do not allow intermediate outcomes to be subject to censoring, or exhibit limited efficiency gains. Here, we propose a two-step augmentation method to improve the estimation of the -year risk model by leveraging longitudinally collected intermediate outcome information that is subject to censoring. Our method allows for the easy incorporation of regularization to accommodate moderate covariate sizes and rare events. We also propose resampling methods to assess the variability of our proposed estimators. Our numerical studies show that the proposed point and interval estimation procedures perform well in a finite sample. We also demonstrate that our proposed estimators are substantially more efficient than existing methods. Finally, we illustrate the proposed methods using data from the Diabetes Prevention Program, a randomized clinical trial on high-risk subjects.
Keywords: Efficiency augmentation; intermediate outcomes; model misspecification; risk prediction; robustness; survival.