Background: Glioblastoma multiforme (GBM) is a highly aggressive brain cancer with poor prognosis and limited treatment options. Despite advances in understanding its molecular mechanisms, effective therapeutic strategies remain elusive due to the tumor's genetic complexity and heterogeneity.
Methods: This study employed a comprehensive analysis approach integrating 113 machine learning algorithms with Mendelian Randomization (MR) analysis to investigate the molecular underpinnings of GBM. Five publicly available gene expression datasets were analyzed to identify differentially expressed genes (DEGs) associated with GBM. Weighted Gene Co-expression Network Analysis (WGCNA) was used to identify GBM-related gene modules. Further, gene set enrichment and variation analyses were conducted to explore the biological pathways involved. The machine learning models were evaluated using Receiver Operating Characteristic (ROC) curves and confusion matrices to assess their predictive accuracy, with the best-performing model validated across external datasets. MR analysis was performed to establish causal relationships between genetically predicted gene expression levels and GBM outcomes.
Results: The study identified 286 DEGs between GBM and adjacent normal tissues across five datasets. WGCNA highlighted the yellow module as the most relevant to GBM, containing key genes such as KLHL3, FOXO4, and MAP1A. Of the 113 machine learning models tested, Ridge regression achieved the highest area under the curve (AUC) of 0.92, demonstrating robust predictive accuracy. Validation using external datasets confirmed the model's reliability, with a classification accuracy of 89.5% in the training set and 85.3% in the validation sets. MR analysis provided strong evidence of a causal relationship between the expression levels of the identified genes and GBM risk.
Conclusions: This study demonstrates the power of combining machine learning and Mendelian Randomization to uncover novel genetic markers for GBM. The identified genes offer promising potential as biomarkers for GBM diagnosis and therapy, providing new avenues for personalized treatment strategies.
Keywords: Gene co-expression analysis; Glioblastoma multiforme; Machine learning; Mendelian randomization.
© 2025. The Author(s).