Background: In view of the low accuracy of the prognosis model of esophageal squamous cell carcinoma (ESCC), this study aimed to optimize the least squares support vector machine (LSSVM) algorithm to determine the uncertain prognostic factors using a Cloud model, and consequently, to establish a new high-precision prognosis model of ESCC.
Methods: We studied 4,771 ESCC patients(training samples) from the Surveillance, Epidemiology, and End Results (SEER) database and 635 ESCC patients(validation samples) from the Henan Provincial Center for Disease Control and Prevention (HCDC) database, with the same exclusion criteria and inclusion criteria for both databases, and obtained permission to obtain a research data file in the SEER database from the National Cancer Institute. The independent risk factors were analyzed using the log-rank method, survival curves, univariate and multivariate Cox analysis. Finally, the independent prognostic factors were used to construct the nomogram, random forest and Cloud-LSSVM prognostic models were utilized for validation.
Results: The overall median survival time of the SEER database was 14 months (HCDC samples was 46 months), the mean survival time was 26.5 months (HCDC samples was 36.8 months), and the 3-year survival rate was 65.8%. This is because most of the patients with Henan samples are early ESCC, and most of the Seer patients are T3 and T4 people. The multivariate Cox analysis showed that age at diagnosis (P<0.001), sex (P=0.001), race (P=0.002), differentiation grade (P<0.001), pathologic T category (P<0.001), and pathologic M category (P<0.001) were the factors affecting the prognosis of ESCC patients. The SEER data and HCDC database results showed that the accuracy of the Cloud-LSSVM (C-index =0.71, 0.689) model is higher than the differentiation grade (C-index =0.548, 0.506), random forest (C-index =0.649, 0.498), and nomogram (C-index =0.659, 0.563). This new model can realize the unity of the randomness and fuzziness of the Cloud model and utilize the powerful learning and non-linear mapping abilities of LSSVM.
Conclusions: Due to the difference of clans between training samples and test samples, the accuracy of prediction is generally not high, but the accuracy of Cloud-LSSVM model is much higher than other models. The new model provides a clear prognostic superiority over the random forest, nomogram, and other models.
Keywords: Cloud-least squares support vector machine (Cloud-LSSVM); Esophageal squamous cell carcinoma (ESCC); Surveillance, Epidemiology, and End Results (SEER); machine learning; prognostic.
2023 Journal of Thoracic Disease. All rights reserved.