Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics

Bobbie-Jo M Webb-Robertson; Holli K Wiberg; Melissa M Matzke; Joseph N Brown; Jing Wang; Jason E McDermott; Richard D Smith; Karin D Rodland; Thomas O Metz; Joel G Pounds; Katrina M Waters

doi:10.1021/pr501138h

Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics

J Proteome Res. 2015 May 1;14(5):1993-2001. doi: 10.1021/pr501138h. Epub 2015 Apr 22.

Affiliation

¹ Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States.

Abstract

In this review, we apply selected imputation strategies to label-free liquid chromatography-mass spectrometry (LC-MS) proteomics datasets to evaluate the accuracy with respect to metrics of variance and classification. We evaluate several commonly used imputation approaches for individual merits and discuss the caveats of each approach with respect to the example LC-MS proteomics data. In general, local similarity-based approaches, such as the regularized expectation maximization and least-squares adaptive algorithms, yield the best overall performances with respect to metrics of accuracy and robustness. However, no single algorithm consistently outperforms the remaining approaches, and in some cases, performing classification without imputation sometimes yielded the most accurate classification. Thus, because of the complex mechanisms of missing data in proteomics, which also vary from peptide to protein, no individual method is a single solution for imputation. On the basis of the observations in this review, the goal for imputation in the field of computational proteomics should be to develop new approaches that work generically for this data type and new strategies to guide users in the selection of the best imputation for their dataset and analysis objectives.

Keywords: Imputation; accuracy; classification; label free; mean-square error; peak intensity.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.
Review

MeSH terms

Algorithms
Animals
Blood Proteins / analysis*
Chromatography, Liquid / statistics & numerical data*
Humans
Lung / chemistry
Mass Spectrometry / statistics & numerical data*
Mice
Peptides / analysis*
Proteomics / methods
Proteomics / statistics & numerical data*

Substances

Blood Proteins
Peptides

Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics

Authors

Affiliation

Abstract

Publication types

MeSH terms

Substances

Grants and funding