Objective: Accurate predictive models for second primary non-small cell lung cancer (SP-NSCLC) are limited. This study aimed to develop and validate overall survival (OS) prediction models for SP-NSCLC patients using time-dependent interpretable survival machine learning algorithms.
Methods: This study utilized data from the Surveillance, Epidemiology, and End Results (SEER) database, encompassing 8 and 12 registries, to extract data on patients aged 20-89 diagnosed with SP-NSCLC between 1988 and 2020. The dataset was divided into development, external temporal and spatial validation cohorts. Predictors included demographic, clinical, pathological and initial primary cancer-related features. Multiple survival machine learning algorithms were developed and validated, assessing model performance using C-index, time-dependent area under the receiver operating characteristic curve (time-AUC), and time-dependent Brier Score. The time-dependent interpretability analysis was employed to explore the time-dependent feature importance of key predictors.
Results: The Blackboost model demonstrated excellent performance (C-index: 0.7517, and time-AUC: 0.8438), and good calibration (time-Brier Score of 0.0754). External validations and subgroup analyses demonstrated the robustness, generalizability, and fairness. Utilizing the optimal cutoff threshold, high-risk groups could be effectively identified. Surgery was the most critical predictor across the entire survival period. Combined stage (distant) and chemotherapy were the second most important predictors within 0 to 5 years, while age replaced from 5 to 20 years. Additionally, we developed an online visualization tool.
Conclusions: The Blackboost survival model achieved accurate, fair, and robust survival prediction for SP-NSCLC patients. Surgery, combined stage (distant), chemotherapy, and age contributed differently across various survival periods. The online visualization tool facilitated personalized survival predictions.
Keywords: Machine learning; Overall survival prediction; Second primary non-small cell lung cancer; Surgery; Time-dependent interpretability.
Copyright © 2024 Elsevier B.V. All rights reserved.