Moving Toward Metaproteogenomics: A Computational Perspective on Analyzing Microbial Samples via Proteogenomics

Methods Mol Biol. 2025:2859:297-318. doi: 10.1007/978-1-0716-4152-1_17.

Abstract

Microbial sample analysis has received growing attention within the last decade, driven by important findings in microbiome research and promising applications in the biotechnological field. Modern mass spectrometry-based methodology has been established in this context, providing sufficient sensitivity, resolution, dynamic range, and throughput to analyze the so-called metaproteome of complex microbial mixtures from clinical or environmental samples. While proteomic analyses were previously restricted to common model organisms, next-generation sequencing technologies nowadays allow for the rapid and cost-efficient characterization of whole metagenomes of microbial consortia and specific genomes from non-model organisms to which microbes contribute by significant amounts. This proteogenomic approach, meaning the combined application of genomic and proteomic methods, enables researchers to create a protein database that presents a tailored blueprint of the microbial sample under investigation. This contribution provides an overview of the computational challenges and opportunities in proteogenomics and metaproteomics as of January 2018. For practical application, we first showcase an integrative proteogenomic method that circumvents existing reference databases by creating sample-specific transcripts. The underlying algorithm uses a graph network approach that combines RNA-Seq and peptide information. As a second example, we provide a tutorial for a simulation tool that estimates the computational limits of detecting microbial non-model organisms. This method evaluates the potential influence of error-tolerant searches and proteogenomic approaches on databases of interest. Finally, we discuss recommendations for developing future strategies that may help overcome present limitations by combining the strengths of genome- and proteome-based methods and moving toward an integrated metaproteogenomics approach.

Keywords: Bacteria; Bioinformatics; Computational proteomics; Metagenomics; Metaproteomics; Microbes; Microbial community samples; Protein identification; Proteogenomics; Tandem mass spectrometry.

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Databases, Protein
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Mass Spectrometry / methods
  • Metagenome
  • Metagenomics / methods
  • Microbiota* / genetics
  • Proteogenomics* / methods
  • Proteome / genetics
  • Proteomics / methods
  • Software

Substances

  • Proteome