Deriving the probabilities of water loss and ammonia loss for amino acids from tandem mass spectra

Shiwei Sun; Chungong Yu; Yantao Qiao; Yu Lin; Gongjin Dong; Changning Liu; Jingfen Zhang; Zhuo Zhang; Jinjin Cai; Hong Zhang; Dongbo Bu

doi:10.1021/pr070479v

Deriving the probabilities of water loss and ammonia loss for amino acids from tandem mass spectra

J Proteome Res. 2008 Jan;7(1):202-8. doi: 10.1021/pr070479v. Epub 2007 Dec 20.

Authors

Shiwei Sun¹, Chungong Yu, Yantao Qiao, Yu Lin, Gongjin Dong, Changning Liu, Jingfen Zhang, Zhuo Zhang, Jinjin Cai, Hong Zhang, Dongbo Bu

Affiliation

¹ Bioinformatics Group, Center for Advanced Computing Research, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China.

PMID: 18092745
DOI: 10.1021/pr070479v

Abstract

In protein identification through tandem mass spectrometry, it is critical to accurately predict the theoretical spectrum for a peptide sequence. The widely used prediction models, such as SEQUEST and MASCOT, ignore the intensity of the ions with important neutral losses, including water loss and ammonia loss. However, ignoring these neutral losses results in a significant deviation between the predicted theoretical spectrum and its experimental counterpart. Here, based on the "one peak, multiple explanations" observation, we proposed an expectation-maximization (EM) method to automatically learn the probabilities of water loss and ammonia loss for each amino acid. Then we employed these probabilities to design an improved statistical model for theoretical spectrum prediction. We implemented these methods and tested them on practical data. On a training set containing 1803 spectra, the experimental results show a good agreement with some known knowledge about neutral losses, such as the tendency of water loss from Asp, Glu, Ser, and Thr. Furthermore, on a testing set containing 941 spectra, the improved similarity between the experimental and predicted spectra demonstrates that this method can generate more reasonable predictions relative to the model that ignores neutral losses. As an application of the derived probabilities, we implemented a database searching method adopting the improved theoretical spectrum model with neutral loss ions estimated. Experimental results on Keller's data set demonstrate that this method can identify peptides more accurately than SEQUEST. In another application to validate SEQUEST's results, the reported peptide-spectrum pairs are reranked with respect to the similarity between experimental and predicted spectra. Experimental results on both LTQ and QSTAR data sets suggest that this reranking strategy can effectively distinguish the false negative predictions reported by SEQUEST.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Amino Acids / chemistry*
Ammonia / chemistry*
Databases, Factual
Peptides / analysis*
Probability
Software
Tandem Mass Spectrometry / methods*
Water / chemistry*

Substances

Amino Acids
Peptides
Water
Ammonia