Integrated View of Baseline Protein Expression in Human Tissues Using Public Data Independent Acquisition Data Sets

J Proteome Res. 2025 Jan 7. doi: 10.1021/acs.jproteome.4c00788. Online ahead of print.

Abstract

The PRIDE database is the largest public data repository of mass spectrometry-based proteomics data and currently stores more than 40,000 data sets covering a wide range of organisms, experimental techniques, and biological conditions. During the past few years, PRIDE has seen a significant increase in the amount of submitted data-independent acquisition (DIA) proteomics data sets. This provides an excellent opportunity for large-scale data reanalysis and reuse. We have reanalyzed 15 public label-free DIA data sets across various healthy human tissues to provide a state-of-the-art view of the human proteome in baseline conditions (without any perturbations). We computed baseline protein abundances and compared them across various tissues, samples, and data sets. Our second aim was to compare protein abundances obtained here from the results of previous analyses using human baseline data-dependent acquisition (DDA) data sets. We observed a good correlation across some tissues, especially in the liver and colon, but weak correlations were found in others, such as the lung and pancreas. The reanalyzed results including protein abundance values and curated metadata are made available to view and download from the resource Expression Atlas.

Keywords: Expression Atlas; PRIDE; baseline expression; data independent acquisition; data reanalysis; mass spectrometry; proteomics.