Early diagnosis and treatment of myocardial infarction (MI) can significantly reduce the severity of the disease. Disease data are often imbalanced, which can lead to poor prediction outcomes when using conventional models. Therefore, developing a risk prediction model for MI with imbalanced datasets has become challenging. This paper presents a novel model called 2GDNN-FL-Stacked, which aims to address the issue of predicting the risk of MI in imbalanced data. Our group mitigates the impact of data imbalance on the model by employing random under-sampling and cost-sensitive techniques. We improve the model's identification capabilities by stacking and combining 2GDNN-FL, CatBoost, RandomForest, and LightGBM. Our model's Matthews Correlation Coefficient(MCC), F1-score, and Area Under the ROC Curve(AUC) scores increased by 0.87% 15.70%, 0.55% 9.81%, and 0.75% 8.11% respectively, compared to some baseline models, which represent a significant improvement over the performance of a single model on imbalanced datasets. This paper demonstrates the effectiveness of each component through ablation experiments, showing that removing either component affects model performance and proves the efficacy of all components. The method offers new insights into predicting heart attack risks and has the potential to offer potent assistance in making clinical decisions.
Keywords: Cost-sensitive learning; Imbalanced data; Myocardial infarction risk prediction; Stacking model; Twice-growth deep neural network.
© The Author(s), under exclusive licence to Springer Nature Switzerland AG 2025. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.