PILOT_PROTEIN: identification of unmodified and modified proteins via high-resolution mass spectrometry and mixed-integer linear optimization

J Proteome Res. 2012 Sep 7;11(9):4615-29. doi: 10.1021/pr300418j. Epub 2012 Jul 26.

Abstract

A novel protein identification framework, PILOT_PROTEIN, has been developed to construct a comprehensive list of all unmodified proteins that are present in a living sample. It uses the peptide identification results from the PILOT_SEQUEL algorithm to initially determine all unmodified proteins within the sample. Using a rigorous biclustering approach that groups incorrect peptide sequences with other homologous sequences, the number of false positives reported is minimized. A sequence tag procedure is then incorporated along with the untargeted PTM identification algorithm, PILOT_PTM, to determine a list of all modification types and sites for each protein. The unmodified protein identification algorithm, PILOT_PROTEIN, is compared to the methods SEQUEST, InsPecT, X!Tandem, VEMS, and ProteinProspector using both prepared protein samples and a more complex chromatin digest. The algorithm demonstrates superior protein identification accuracy with a lower false positive rate. All materials are freely available to the scientific community at http://pumpd.princeton.edu.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Databases, Protein
  • Peptide Fragments / chemistry
  • Protein Processing, Post-Translational
  • Proteins / chemistry*
  • Proteomics
  • ROC Curve
  • Sequence Analysis, Protein / methods*
  • Software*
  • Tandem Mass Spectrometry*

Substances

  • Peptide Fragments
  • Proteins