Identifying cancer prognosis genes through causal learning

Brief Bioinform. 2024 Nov 22;26(1):bbae721. doi: 10.1093/bib/bbae721.

Abstract

Accurate identification of causal genes for cancer prognosis is critical for estimating disease progression and guiding treatment interventions. In this study, we propose CPCG (Cancer Prognosis's Causal Gene), a two-stage framework identifying gene sets causally associated with patient prognosis across diverse cancer types using transcriptomic data. Initially, an ensemble approach models gene expression's impact on survival with parametric and semiparametric hazard models. Subsequently, an iterative conditional independence test combined with graph pruning is utilized to infer the causal skeleton, thereby pinpointing prognosis-related genes. Experiments on transcriptomic data from 18 cancer types sourced from The Cancer Genome Atlas Project demonstrate CPCG's effectiveness in predicting prognosis under four evaluation metrics. Validations on 24 additional datasets covering 12 cancer types from the Gene Expression Omnibus and the Chinese Glioma Genome Atlas Project further demonstrate CPCG's robustness and generalizability. CPCG identifies a concise but reliable set of genes, obviating the need for gene combination enumeration for survival time estimation. These genes are also proved closely linked to crucial biological processes in cancer. Moreover, CPCG constructs a stable causal skeleton and exhibits insensitivity to the order of data shuffling. Overall, CPCG is a powerful tool for extracting cancer prognostic biomarkers, offering interpretability, generalizability, and robustness. CPCG holds promise for facilitating targeted interventions in clinical treatment strategies.

Keywords: cancer prognosis; causal structure learning; compact gene set; generalizable and robust predictions; transcriptomic data.

MeSH terms

  • Algorithms
  • Biomarkers, Tumor* / genetics
  • Computational Biology / methods
  • Databases, Genetic
  • Gene Expression Profiling / methods
  • Gene Expression Regulation, Neoplastic
  • Humans
  • Machine Learning
  • Neoplasms* / genetics
  • Prognosis
  • Transcriptome

Substances

  • Biomarkers, Tumor