Evaluating Feature Selection Methods for Accurate Diagnosis of Diabetic Kidney Disease

Biomedicines. 2024 Dec 16;12(12):2858. doi: 10.3390/biomedicines12122858.

Abstract

Background/Objectives: The increase in patients with type 2 diabetes, coupled with the development of complications caused by the same disease is an alarming aspect for the health sector. One of the main complications of diabetes is nephropathy, which is also the main cause of kidney failure. Once diagnosed, in Mexican patients the kidney damage is already highly compromised, which is why acting preventively is extremely important. The aim of this research is to compare distinct methodologies of feature selection to identify discriminant risk factors that may be beneficial for early treatment, and prevention. Methods: This study focused on evaluating a Mexican dataset collected from 22 patients containing 32 attributes. To reduce the dimensionality and choose the most important variables, four feature selection algorithms: Univariate, Boruta, Galgo, and Elastic net were implemented. After selecting suitable features detected by the methodologies, they are included in the random forest classifier, obtaining four models. Results: Galgo with Random Forest achieved the best performance with only three predictors, "creatinine", "urea", and "lipids treatment". The model displayed a moderate classification performance with an area under the curve of 0.80 (±0.3535 SD), a sensitivity of 0.909, and specificity of 0.818. Conclusions: It is demonstrated that the proposed methodology has the potential to facilitate the prompt identification of nephropathy and non-nephropathy patients, and thereby could be used in the clinical area as a preliminary computer-aided diagnosis tool.

Keywords: diabetic kidney disease; feature selection algorithms; machine learning; random forest; risk factors.

Grants and funding

This research received no external funding.