Analyzing the large-scale survival data from the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) Program may help guide the management of cancer. Detecting and characterizing the time-varying effects of factors collected at the time of diagnosis could reveal important and useful patterns. However, fitting a time-varying effect model by maximizing the partial likelihood with such large-scale survival data is not feasible with most existing software. Moreover, estimating time-varying coefficients using spline based approaches requires a moderate number of knots, which may lead to unstable estimation and over-fitting issues. To resolve these issues, adding a penalty term greatly aids estimation. The selection of penalty smoothing parameters is difficult in this time-varying setting, as traditional ways like using Akaike information criterion do not work, while cross-validation methods have a heavy computational burden, leading to unstable selections. We propose modified information criteria to determine the smoothing parameter and a parallelized Newton-based algorithm for estimation. We conduct simulations to evaluate the performance of the proposed method. We find that penalization with the smoothing parameter chosen by a modified information criteria is effective at reducing the mean squared error of the estimated time-varying coefficients. Compared to a number of alternatives, we find that the estimates of the variance derived from Bayesian considerations have the best coverage rates of confidence intervals. We apply the method to SEER head-and-neck, colon, prostate, and pancreatic cancer data and detect the time-varying nature of various risk factors.
Keywords: Cancer study; information criteria; penalization; survival analysis; time-varying effects.