In-depth method assessments of differentially expressed protein detection for shotgun proteomics data with missing values

Sci Rep. 2017 Jun 13;7(1):3367. doi: 10.1038/s41598-017-03650-8.

Abstract

Considering as one of the major goals in quantitative proteomics, detection of the differentially expressed proteins (DEPs) plays an important role in biomarker selection and clinical diagnostics. There have been plenty of algorithms and tools focusing on DEP detection in proteomics research. However, due to the different application scopes of these methods, and various kinds of experiment designs, it is not very apparent about the best choice for large-scale proteomics data analyses. Moreover, given the fact that proteomics data usually contain high percentage of missing values (MVs), but few replicates, a systematic evaluation of the DEP detection methods combined with the MV imputation methods is essential and urgent. Here, we analyzed a total of four representative imputation methods and five DEP methods on different experimental and simulated datasets. The results showed that (i) MV imputation could not always improve the performances of DEP detection methods and the imputation effects differed in the missing value percentages; (ii) the DEP detection methods had different statistical powers affected by the percentage of MVs. Two statistical methods (i.e. the empirical Bayesian random censoring threshold model, and the significance analysis of microarray) performed better than the other evaluated methods in terms of accuracy and sensitivity.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Data Interpretation, Statistical*
  • Escherichia coli Proteins / metabolism*
  • HeLa Cells
  • Humans
  • Neoplasm Proteins / metabolism*
  • Proteome / analysis*
  • Proteomics / methods*
  • ROC Curve

Substances

  • Escherichia coli Proteins
  • Neoplasm Proteins
  • Proteome