Estimating the statistical significance of peptide identifications from shotgun proteomics experiments

J Proteome Res. 2007 May;6(5):1758-67. doi: 10.1021/pr0605320. Epub 2007 Mar 31.

Abstract

We present a wrapper-based approach to estimate and control the false discovery rate for peptide identifications using the outputs from multiple commercially available MS/MS search engines. Features of the approach include the flexibility to combine output from multiple search engines with sequence and spectral derived features in a flexible classification model to produce a score associated with correct peptide identifications. This classification model score from a reversed database search is taken as the null distribution for estimating p-values and false discovery rates using a simple and established statistical procedure. Results from 10 analyses of rat sera on an LTQ-FT mass spectrometer indicate that the method is well calibrated for controlling the proportion of false positives in a set of reported peptide identifications while correctly identifying more peptides than rule-based methods using one search engine alone.

MeSH terms

  • Adult
  • Amino Acid Sequence
  • Animals
  • Calibration
  • False Positive Reactions
  • Humans
  • Male
  • Mass Spectrometry / instrumentation
  • Mass Spectrometry / methods*
  • Middle Aged
  • Peptides* / chemistry
  • Peptides* / classification
  • Peptides* / genetics
  • Peptides* / metabolism
  • Polymorphism, Genetic
  • Proteomics*
  • ROC Curve
  • Rats

Substances

  • Peptides