Machine learning model for cardiovascular disease prediction in patients with chronic kidney disease

Front Endocrinol (Lausanne). 2024 May 28:15:1390729. doi: 10.3389/fendo.2024.1390729. eCollection 2024.

Abstract

Introduction: Cardiovascular disease (CVD) is the leading cause of death in patients with chronic kidney disease (CKD). This study aimed to develop CVD risk prediction models using machine learning to support clinical decision making and improve patient prognosis.

Methods: Electronic medical records from patients with CKD at a single center from 2015 to 2020 were used to develop machine learning models for the prediction of CVD. Least absolute shrinkage and selection operator (LASSO) regression was used to select important features predicting the risk of developing CVD. Seven machine learning classification algorithms were used to build models, which were evaluated by receiver operating characteristic curves, accuracy, sensitivity, specificity, and F1-score, and Shapley Additive explanations was used to interpret the model results. CVD was defined as composite cardiovascular events including coronary heart disease (coronary artery disease, myocardial infarction, angina pectoris, and coronary artery revascularization), cerebrovascular disease (hemorrhagic stroke and ischemic stroke), deaths from all causes (cardiovascular deaths, non-cardiovascular deaths, unknown cause of death), congestive heart failure, and peripheral artery disease (aortic aneurysm, aortic or other peripheral arterial revascularization). A cardiovascular event was a composite outcome of multiple cardiovascular events, as determined by reviewing medical records.

Results: This study included 8,894 patients with CKD, with a composite CVD event incidence of 25.9%; a total of 2,304 patients reached this outcome. LASSO regression identified eight important features for predicting the risk of CKD developing into CVD: age, history of hypertension, sex, antiplatelet drugs, high-density lipoprotein, sodium ions, 24-h urinary protein, and estimated glomerular filtration rate. The model developed using Extreme Gradient Boosting in the test set had an area under the curve of 0.89, outperforming the other models, indicating that it had the best CVD predictive performance.

Conclusion: This study established a CVD risk prediction model for patients with CKD, based on routine clinical diagnostic and treatment data, with good predictive accuracy. This model is expected to provide a scientific basis for the management and treatment of patients with CKD.

Keywords: cardiovascular disease; chronic kidney disease; electronic medical records; machine learning; prediction model.

MeSH terms

  • Adult
  • Aged
  • Cardiovascular Diseases* / epidemiology
  • Cardiovascular Diseases* / etiology
  • Female
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Prognosis
  • Renal Insufficiency, Chronic* / complications
  • Renal Insufficiency, Chronic* / epidemiology
  • Retrospective Studies
  • Risk Assessment / methods
  • Risk Factors

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This study was supported by the National Natural Science Foundation of China (No.32141005,No.82030025), Capital’s Funds for Health Improvement and Research, China (No. Z221100007422121), Young Talent Project of Chinese PLA General Hospital (NO.2019XXMBD-005, NO.2019XXJSYX01), Project of the Department of the Integrated Traditional Chinese and Western Medicine and Ethnic Minority Medicine of the State Administration of Traditional Chinese Medicine(No.2023384) and High Level Key Disciplines of Traditional Chinese Medicine of the State Administration of Traditional Chinese Medicine (No.zyyzdxk−2023310).