Deep learning the collisional cross sections of the peptide universe from a million experimental values

Nat Commun. 2021 Feb 19;12(1):1185. doi: 10.1038/s41467-021-21352-8.

Abstract

The size and shape of peptide ions in the gas phase are an under-explored dimension for mass spectrometry-based proteomics. To investigate the nature and utility of the peptide collisional cross section (CCS) space, we measure more than a million data points from whole-proteome digests of five organisms with trapped ion mobility spectrometry (TIMS) and parallel accumulation-serial fragmentation (PASEF). The scale and precision (CV < 1%) of our data is sufficient to train a deep recurrent neural network that accurately predicts CCS values solely based on the peptide sequence. Cross section predictions for the synthetic ProteomeTools peptides validate the model within a 1.4% median relative error (R > 0.99). Hydrophobicity, proportion of prolines and position of histidines are main determinants of the cross sections in addition to sequence-specific interactions. CCS values can now be predicted for any peptide and organism, forming a basis for advanced proteomics workflows that make full use of the additional information.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Caenorhabditis elegans
  • Deep Learning*
  • Drosophila melanogaster
  • Escherichia coli
  • HeLa Cells
  • Humans
  • Ions
  • Neural Networks, Computer
  • Peptides / chemistry*
  • Proteome / analysis*
  • Proteomics / methods*
  • Saccharomyces cerevisiae
  • Tandem Mass Spectrometry / methods*
  • Workflow

Substances

  • Ions
  • Peptides
  • Proteome