A comprehensive and bias-free machine learning approach for risk prediction of preeclampsia with severe features in a nulliparous study cohort

BMC Pregnancy Childbirth. 2024 Dec 24;24(1):853. doi: 10.1186/s12884-024-06988-w.

Abstract

Preeclampsia is one of the leading causes of maternal morbidity, with consequences during and after pregnancy. Because of its diverse clinical presentation, preeclampsia is an adverse pregnancy outcome that is uniquely challenging to predict and manage. In this paper, we developed racial bias-free machine learning models that predict the onset of preeclampsia with severe features or eclampsia at discrete time points in a nulliparous pregnant study cohort. To focus on those most at risk, we selected probands with severe PE (sPE). Those with mild preeclampsia, superimposed preeclampsia, and new onset hypertension were excluded.The prospective study cohort to which we applied machine learning is the Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-be (nuMoM2b) study, which contains information from eight clinical sites across the US. Maternal serum samples were collected for 1,857 individuals between the first and second trimesters. These patients with serum samples collected are selected as the final cohort.Our prediction models achieved an AUROC of 0.72 (95% CI, 0.69-0.76), 0.75 (95% CI, 0.71-0.79), and 0.77 (95% CI, 0.74-0.80), respectively, for the three visits. Our initial models were biased toward non-Hispanic black participants with a high predictive equality ratio of 1.31. We corrected this bias and reduced this ratio to 1.14. This lowers the rate of false positives in our predictive model for the non-Hispanic black participants. The exact cause of the bias is still under investigation, but previous studies have recognized PLGF as a potential bias-inducing factor. However, since our model includes various factors that exhibit a positive correlation with PLGF, such as blood pressure measurements and BMI, we have employed an algorithmic approach to disentangle this bias from the model.The top features of our built model stress the importance of using several tests, particularly for biomarkers (BMI and blood pressure measurements) and ultrasound measurements. Placental analytes (PLGF and Endoglin) were strong predictors for screening for the early onset of preeclampsia with severe features in the first two trimesters.

Keywords: Ensemble model; Fairness in machine learning; Machine learning; PlGF; Preeclampsia; Preeclampsia with severe features.

MeSH terms

  • Adult
  • Biomarkers / blood
  • Cohort Studies
  • Female
  • Humans
  • Machine Learning*
  • Parity*
  • Placenta Growth Factor / blood
  • Pre-Eclampsia* / blood
  • Pre-Eclampsia* / diagnosis
  • Pregnancy
  • Pregnancy Trimester, First / blood
  • Pregnancy Trimester, Second / blood
  • Prospective Studies
  • Risk Assessment / methods
  • Risk Factors
  • Severity of Illness Index

Substances

  • Placenta Growth Factor
  • Biomarkers