Development of a COVID-19 early risk assessment system based on multiple machine learning algorithms and routine blood tests: a real-world study

Qiangqiang Qin; Qingxuan Li; Guiyin Zhu; Haiyang Yu; Mingyan Peng; Shuang Wu; Xue Xu; Wen Gu; Xuejun Guo

doi:10.3389/fimmu.2024.1430899

Development of a COVID-19 early risk assessment system based on multiple machine learning algorithms and routine blood tests: a real-world study

Front Immunol. 2024 Sep 30:15:1430899. doi: 10.3389/fimmu.2024.1430899. eCollection 2024.

Authors

Qiangqiang Qin^#¹, Qingxuan Li^#², Guiyin Zhu¹, Haiyang Yu¹, Mingyan Peng³, Shuang Wu⁴, Xue Xu¹, Wen Gu¹, Xuejun Guo¹

Affiliations

¹ Department of Respiratory Medicine, Xinhua Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China.
² Department of Respiratory and Critical Care Medicine, The Second Hospital of Jilin University, Changchun, Jilin, China.
³ Department of Gynecology and Obstetrics, Xinhua Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China.
⁴ Stomatologic Hospital and College, Anhui Medical University, Hefei, China.

^# Contributed equally.

Abstract

Backgrounds: During the Coronavirus Disease 2019 (COVID-19) epidemic, the massive spread of the disease has placed an enormous burden on the world's healthcare and economy. The early risk assessment system based on a variety of machine learning (ML) algorithms may be able to provide more accurate advice on the classification of COVID-19 patients, offering predictive, preventive, and personalized medicine (PPPM) solutions in the future.

Methods: In this retrospective study, we divided a portion of the data into training and validation cohorts in a 7:3 ratio and established a model based on a combination of two ML algorithms first. Then, we used another portion of the data as an independent testing cohort to determine the most accurate and stable model and compared it with other scoring systems. Finally, patients were categorized according to risk scores and then the correlation between their clinical data and risk scores was studied.

Results: The elderly accounted for the majority of hospitalized patients with COVID-19. The C-index of the model constructed by combining the stepcox[both] and survivalSVM algorithms was 0.840 in the training cohort and 0.815 in the validation cohort, which was calculated to have the highest C-index in the testing cohort compared to the other 119 ML model combinations. Compared with current scoring systems, including the CURB-65 and several reported prognosis models previously, our model had the highest AUC value of 0.778, representing an even higher predictive performance. In addition, the model's AUC values for specific time intervals, including days 7,14 and 28, demonstrate excellent predictive performance. Most importantly, we stratified patients according to the model's risk score and demonstrated a difference in survival status between the high-risk, median-risk, and low-risk groups, which means a new and stable risk assessment system was built. Finally, we found that COVID-19 patients with a history of cerebral infarction had a significantly higher risk of death.

Conclusion: This novel risk assessment system is highly accurate in predicting the prognosis of patients with COVID-19, especially elderly patients with COVID-19, and can be well applied within the PPPM framework. Our ML model facilitates stratified patient management, meanwhile promoting the optimal use of healthcare resources.

Keywords: COVID-19; categorized treatment; machine learning; predictive model; predictive preventive personalized medicine.

MeSH terms

Adult
Aged
Aged, 80 and over
Algorithms*
COVID-19* / diagnosis
Female
Hematologic Tests / methods
Humans
Machine Learning*
Male
Middle Aged
Retrospective Studies
Risk Assessment / methods
SARS-CoV-2*

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work received funding from the Shanghai Science and Technology Commission (Grant No. 22Y11901700).