A Machine Learning-Based Prediction Model for Cardiovascular Risk in Women With Preeclampsia

Guan Wang; Yanbo Zhang; Sijin Li; Jun Zhang; Dongkui Jiang; Xiuzhen Li; Yulin Li; Jie Du

doi:10.3389/fcvm.2021.736491

A Machine Learning-Based Prediction Model for Cardiovascular Risk in Women With Preeclampsia

Front Cardiovasc Med. 2021 Oct 27:8:736491. doi: 10.3389/fcvm.2021.736491. eCollection 2021.

Authors

Guan Wang^{1

2}, Yanbo Zhang³, Sijin Li⁴, Jun Zhang¹, Dongkui Jiang², Xiuzhen Li², Yulin Li¹, Jie Du¹

Affiliations

¹ Beijing Anzhen Hospital, Capital Medical University, The Key Laboratory of Remodeling-Related Cardiovascular Diseases, Ministry of Education, Beijing Institute of Heart, Lung and Blood Vessel Diseases, Beijing, China.
² Beijing University of Chinese Medicine Third Affiliated Hospital, Beijing, China.
³ Department of Health Statistics, School of Public Health, Shanxi Medical University, Shanxi Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China.
⁴ First Hospital of Shanxi Medical University, Molecular Imaging Precision Medicine Collaborative Innovation Center, Shanxi Medical University, Taiyuan, China.

Abstract

Objective: Preeclampsia affects 2-8% of women and doubles the risk of cardiovascular disease in women after preeclampsia. This study aimed to develop a model based on machine learning to predict postpartum cardiovascular risk in preeclamptic women. Methods: Collecting demographic characteristics and clinical serum markers associated with preeclampsia during pregnancy of 907 preeclamptic women retrospectively, we predicted the cardiovascular risk (ischemic heart disease, ischemic cerebrovascular disease, peripheral vascular disease, chronic kidney disease, metabolic system disease or arterial hypertension). The study samples were divided into training sets and test sets randomly in the ratio of 8:2. The prediction model was developed by 5 different machine learning algorithms, including Random Forest. 10-fold cross-validation was performed on the training set, and the performance of the model was evaluated on the test set. Results: Cardiovascular disease risk occurred in 186 (20.5%) of these women. By weighing area under the curve (AUC), the Random Forest algorithm presented the best performance (AUC = 0.711[95%CI: 0.697-0.726]) and was adopted in the feature selection and the establishment of the prediction model. The most important variables in Random Forest algorithm included the systolic blood pressure, Urea nitrogen, neutrophil count, glucose, and D-Dimer. Random Forest algorithm was well calibrated (Brier score = 0.133) in the test group, and obtained the highest net benefit in the decision curve analysis. Conclusion: Based on the general situation of patients and clinical variables, a new machine learning algorithm was developed and verified for the individualized prediction of cardiovascular risk in post-preeclamptic women.

Keywords: cardiovascular disease; hypertension; machine learning; model; prediction; preeclampsia.