Integrated meta-analysis of colorectal cancer public proteomic datasets for biomarker discovery and validation

PLoS Comput Biol. 2024 Jan 22;20(1):e1011828. doi: 10.1371/journal.pcbi.1011828. eCollection 2024 Jan.

Abstract

The cancer biomarker field has been an object of thorough investigation in the last decades. Despite this, colorectal cancer (CRC) heterogeneity makes it challenging to identify and validate effective prognostic biomarkers for patient classification according to outcome and treatment response. Although a massive amount of proteomics data has been deposited in public data repositories, this rich source of information is vastly underused. Here, we attempted to reuse public proteomics datasets with two main objectives: i) to generate hypotheses (detection of biomarkers) for their posterior/downstream validation, and (ii) to validate, using an orthogonal approach, a previously described biomarker panel. Twelve CRC public proteomics datasets (mostly from the PRIDE database) were re-analysed and integrated to create a landscape of protein expression. Samples from both solid and liquid biopsies were included in the reanalysis. Integrating this data with survival annotation data, we have validated in silico a six-gene signature for CRC classification at the protein level, and identified five new blood-detectable biomarkers (CD14, PPIA, MRC2, PRDX1, and TXNDC5) associated with CRC prognosis. The prognostic value of these blood-derived proteins was confirmed using additional public datasets, supporting their potential clinical value. As a conclusion, this proof-of-the-concept study demonstrates the value of re-using public proteomics datasets as the basis to create a useful resource for biomarker discovery and validation. The protein expression data has been made available in the public resource Expression Atlas.

Publication types

  • Meta-Analysis

MeSH terms

  • Biomarkers, Tumor / metabolism
  • Blood Proteins
  • Colorectal Neoplasms* / diagnosis
  • Colorectal Neoplasms* / genetics
  • Colorectal Neoplasms* / metabolism
  • Humans
  • Protein Disulfide-Isomerases
  • Proteomics*

Substances

  • Biomarkers, Tumor
  • Blood Proteins
  • TXNDC5 protein, human
  • Protein Disulfide-Isomerases

Grants and funding

This project was supported by grants RTI2018-095055-B-I00, PID2021-122227OB-I00 and PDC2022-133056-I00 from the Ministerio de Ciencia e Innovación (MCIN/AEI/10.13039/501100011033) using Next Generation EU/PRTR funds to JIC, and BBSRC grant number BB/T019670/1 and EMBL core funding to JAV. JR was supported by IND2019/BMD-17153 fellowship of the Comunidad de Madrid. The funders did not play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.