Potential Biomarkers for Predicting the Risk of Developing Into Long COVID After COVID-19 Infection

Immun Inflamm Dis. 2025 Jan;13(1):e70137. doi: 10.1002/iid3.70137.

Abstract

Background: Long COVID, a heterogeneous condition characterized by a range of physical and neuropsychiatric presentations, can be presented with a proportion of COVID-19-infected individuals.

Methods: Transcriptomic data sets of those within gene expression profiles of COVID-19, long COVID, and healthy controls were retrieved from the GEO database. Differentially expressed genes (DEGs) falling under COVID-19 and long COVID were identified with R packages, and contemporaneously conducted module detection was performed with the Modular Pharmacology Platform (http://112.86.129.72:48081/). The integration of both DEGs and differentially expressed module-genes (DEMGs) regarding long COVID and COVID-19 was intersected by following Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Set Enrichment Analysis (GSEA).

Results: There were 11 and 62 differentially expressed modules, 1837 and 179 DEGs, as well as 103 and 508 DEMGs acquiring identified for both COVID-19 and long COVID, notably enriched in the immune-correlated signaling pathways. The immune infiltrating cells of long COVID and COVID-19 were comparatively and respectively assessed via CIBERSORT, ssGSEA, and xCell algorithms. Subsequently, the screening of hub genes involved employing the SVM-RFE, RF, XGBoost algorithms, and logistic regression analysis. Among the 67 candidate genes were processed with machine learning algorithms and logistic regression, a subgroup consisting of CEP55, CDCA2, MELK, and DEPDC1B, was at last identified as potential biomarkers for predicting the risk of the progression into long COVID after COVID-19 infections. The predicting performance of the potential biomarkers was quantified with a ROC value of 0.8762542, which proved the combination of potential biomarkers provided the highest performance.

Conclusions: In summary, we identified a subgroup of potential biomarkers for predicting the risk of the progression into long COVID after COVID-19 infection, which could be partly elucidation of the associated molecular mechanisms for long COVID.

Keywords: COVID‐19; immune cell infiltration; long COVID; machine learning algorithms; modular pharmacology platform.

MeSH terms

  • Biomarkers*
  • COVID-19* / genetics
  • COVID-19* / immunology
  • Gene Expression Profiling
  • Gene Ontology
  • Humans
  • Post-Acute COVID-19 Syndrome
  • SARS-CoV-2* / immunology
  • SARS-CoV-2* / physiology
  • Transcriptome

Substances

  • Biomarkers