PGlcS: Prediction of protein O-GlcNAcylation sites with multiple features and analysis

J Theor Biol. 2015 Sep 7:380:524-9. doi: 10.1016/j.jtbi.2015.06.026. Epub 2015 Jun 24.

Abstract

As a widespread type of protein post-translational modification, O-GlcNAcylation plays crucial regulatory roles in almost all cellular processes and is related to some diseases. To deeply understand O-GlcNAcylated mechanisms, identification of substrates and specific O-GlcNAcylated sites is crucial. Experimental identification is expensive and time-consuming, so computational prediction of O-GlcNAcylated sites has considerable value. In this work, we developed a novel O-GlcNAcylated sites predictor called PGlcS (Prediction of O-GlcNAcylated Sites) by using k-means cluster to obtain informative and reliable negative samples, and support vector machines classifier combined with a two-step feature selection. The performance of PGlcS was evaluated using an independent testing dataset resulting in a sensitivity of 64.62%, a specificity of 68.4%, an accuracy of 68.37%, and a Matthew׳s correlation coefficient of 0.0697, which demonstrated PGlcS was very promising for predicting O-GlcNAcylated sites. The datasets and source code were available in Supplementary information.

Keywords: A two-step feature selection; O-GlcNAcylated mechanisms; Support vector machines; k-means cluster.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Acetylglucosamine / metabolism*
  • Acylation
  • Protein Processing, Post-Translational
  • Proteins / metabolism*

Substances

  • Proteins
  • Acetylglucosamine