Predicting Gestational Diabetes Mellitus in the first trimester using machine learning algorithms: a cross-sectional study at a hospital fertility health center in Iran

Somayeh Kianian Bigdeli; Marjan Ghazisaedi; Seyed Mohammad Ayyoubzadeh; Sedigheh Hantoushzadeh; Marjan Ahmadi

doi:10.1186/s12911-024-02799-3

Predicting Gestational Diabetes Mellitus in the first trimester using machine learning algorithms: a cross-sectional study at a hospital fertility health center in Iran

BMC Med Inform Decis Mak. 2025 Jan 3;25(1):3. doi: 10.1186/s12911-024-02799-3.

Authors

Somayeh Kianian Bigdeli¹, Marjan Ghazisaedi², Seyed Mohammad Ayyoubzadeh³, Sedigheh Hantoushzadeh⁴, Marjan Ahmadi⁵

Affiliations

¹ Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran.
² Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran. ghazimar@tums.ac.ir.
³ Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran. smayyoubzadeh@tums.ac.ir.
⁴ Vali-E-Asr Reproductive Health Research Center, Family Health Research Institute, Imam Khomeini Hospital Complex, Tehran University of Medical Sciences, Tehran, Iran.
⁵ Department of Obstetrics and Gynecology, Tehran University of Medical Sciences, Tehran, Iran.

Abstract

Background: Gestational Diabetes Mellitus (GDM) is a common complication during pregnancy. Late diagnosis can have significant implications for both the mother and the fetus. This research aims to create an early prediction model for GDM in the first trimester of pregnancy. This model will help obstetricians and gynecologists make appropriate decisions for treating and preventing GDM complications.

Methods: This applied descriptive study was conducted at the fertility health center of Vali-e-Asr Hospital in Tehran, Iran. The dataset was collected from the records of pregnant women registered in the hospital's system from 2020 to 2022. Risk factors for designing predictive models were identified through literature review, expert opinions, and clinical specialists' input. The extracted information underwent preprocessing, and six machine learning (ML) methods were developed and evaluated for GDM prediction in the first trimester of pregnancy: decision tree (DT), multilayer perceptron (MLP), k-nearest neighbors (KNN), Naïve Bayes (NB), random forest (RF), and extreme gradient boosting (XGBoost).

Results: Models were evaluated using accuracy, precision, sensitivity, and the area under the receiver operating characteristic curve (AUC). Based on the glucose tolerance test (GTT) results, the RF model achieved the best performance in GDM prediction, with 89% accuracy, 86% precision, 92% recall, and 94% AUC, using demographic variables, medical history, and clinical findings. In modeling based on insulin consumption, the RF model achieved the best results with 62% accuracy, 60% precision, 63% recall, and 63% AUC in predicting GDM using demographic variables and medical history.

Conclusion: The results of this study demonstrate that ML algorithms, especially RF, have acceptable accuracy in the early prediction of GDM during the first trimester of pregnancy.

Keywords: Artificial intelligence; First trimester of pregnancy; Gestational diabetes mellitus; Machine learning; Prediction; Random forest.

MeSH terms

Adult
Algorithms
Cross-Sectional Studies
Diabetes, Gestational* / diagnosis
Female
Humans
Iran
Machine Learning*
Pregnancy
Pregnancy Trimester, First*