Computational prediction of human salivary proteins from blood circulation and application to diagnostic biomarker identification

PLoS One. 2013 Nov 12;8(11):e80211. doi: 10.1371/journal.pone.0080211. eCollection 2013.

Abstract

Proteins can move from blood circulation into salivary glands through active transportation, passive diffusion or ultrafiltration, some of which are then released into saliva and hence can potentially serve as biomarkers for diseases if accurately identified. We present a novel computational method for predicting salivary proteins that come from circulation. The basis for the prediction is a set of physiochemical and sequence features we found to be discerning between human proteins known to be movable from circulation to saliva and proteins deemed to be not in saliva. A classifier was trained based on these features using a support-vector machine to predict protein secretion into saliva. The classifier achieved 88.56% average recall and 90.76% average precision in 10-fold cross-validation on the training data, indicating that the selected features are informative. Considering the possibility that our negative training data may not be highly reliable (i.e., proteins predicted to be not in saliva), we have also trained a ranking method, aiming to rank the known salivary proteins from circulation as the highest among the proteins in the general background, based on the same features. This prediction capability can be used to predict potential biomarker proteins for specific human diseases when coupled with the information of differentially expressed proteins in diseased versus healthy control tissues and a prediction capability for blood-secretory proteins. Using such integrated information, we predicted 31 candidate biomarker proteins in saliva for breast cancer.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biomarkers / metabolism*
  • Breast Neoplasms / metabolism
  • Computational Biology / methods*
  • Female
  • Humans
  • Proteins / metabolism
  • Salivary Glands / metabolism*

Substances

  • Biomarkers
  • Proteins

Grants and funding

This work is supported by the Natural Science Foundation of China (61272207), the Science-Technology Development Projects of Jilin Province of China (20120730, 20130522111JH, 20130522114JH), the Ph.D. Program Foundation of MOE of China (20120061110094, 20120061120106), and the Postdoctoral Science Foundation (2012M520678). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.