Comparisons of the prediction models for undiagnosed diabetes between machine learning versus traditional statistical methods

Sci Rep. 2023 Aug 11;13(1):13101. doi: 10.1038/s41598-023-40170-0.

Abstract

We compared the prediction performance of machine learning-based undiagnosed diabetes prediction models with that of traditional statistics-based prediction models. We used the 2014-2020 Korean National Health and Nutrition Examination Survey (KNHANES) (N = 32,827). The KNHANES 2014-2018 data were used as training and internal validation sets and the 2019-2020 data as external validation sets. The receiver operating characteristic curve area under the curve (AUC) was used to compare the prediction performance of the machine learning-based and the traditional statistics-based prediction models. Using sex, age, resting heart rate, and waist circumference as features, the machine learning-based model showed a higher AUC (0.788 vs. 0.740) than that of the traditional statistical-based prediction model. Using sex, age, waist circumference, family history of diabetes, hypertension, alcohol consumption, and smoking status as features, the machine learning-based prediction model showed a higher AUC (0.802 vs. 0.759) than the traditional statistical-based prediction model. The machine learning-based prediction model using features for maximum prediction performance showed a higher AUC (0.819 vs. 0.765) than the traditional statistical-based prediction model. Machine learning-based prediction models using anthropometric and lifestyle measurements may outperform the traditional statistics-based prediction models in predicting undiagnosed diabetes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Diabetes Mellitus* / diagnosis
  • Diabetes Mellitus* / epidemiology
  • Humans
  • Machine Learning
  • Models, Statistical
  • Nutrition Surveys
  • ROC Curve