Objective: Our study was aimed to construct a predictive model to advance ovarian cancer diagnosis by machine learning.
Methods: A retrospective analysis of patients with pelvic/adnexal/ovarian mass was performed. Potential features related to ovarian cancer were obtained as many as possible. The optimal machine learning algorithm was selected among six candidates through 5-fold cross validation. Top 20 features having the most powerful predictive significance were ranked by Shapley Additive Interpretation (Shap) method. Clinical validation was further performed to confirm whether our model could advance diagnosis of ovarian cancer.
Results: A total of 9,799 patients were collected. The inclusion criteria included age >18 years old, the first diagnosis being pelvic/adnexal/ovarian mass of undetermined significance, and pathological report indispensable. Four hundred and thirty-eight dimensional features were obtained after filtration. LightGBM showed the best performance with accuracy 88%. Among the top 20 features, 55% belonged to laboratory test report, 35% came from imaging examination report, and 10% were attributed to basic demographics and main symptom. Age, CA125, and risk of ovarian malignancy algorithm were the top three. Our predictive model performed stably in testing and clinical validation datasets, and was found to advance the diagnosis of ovarian cancer about 17 days before clinical pathological examination.
Conclusion: LightGBM was the optimal algorithm for our predictive model with accuracy of 88%. Laboratory test and imaging examination played essential roles in diagnosing ovarian cancer. Our model could advance the diagnosis of ovarian cancer before clinical pathological examination.
Keywords: Machine Learning; Machine Prediction Methods; Ovarian Cancer; Predictive Learning Models; Risk Factors.
© 2025. Asian Society of Gynecologic Oncology, Korean Society of Gynecologic Oncology, and Japan Society of Gynecologic Oncology.