Development of a sugar-binding residue prediction system from protein sequences using support vector machine

Masaki Banno; Yusuke Komiyama; Wei Cao; Yuya Oku; Kokoro Ueki; Kazuya Sumikoshi; Shugo Nakamura; Tohru Terada; Kentaro Shimizu

doi:10.1016/j.compbiolchem.2016.10.009

Development of a sugar-binding residue prediction system from protein sequences using support vector machine

Comput Biol Chem. 2017 Feb:66:36-43. doi: 10.1016/j.compbiolchem.2016.10.009. Epub 2016 Nov 9.

Authors

Masaki Banno¹, Yusuke Komiyama², Wei Cao¹, Yuya Oku¹, Kokoro Ueki¹, Kazuya Sumikoshi¹, Shugo Nakamura¹, Tohru Terada¹, Kentaro Shimizu³

Affiliations

¹ Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan.
² Digital Content and Media Sciences Research Division, National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-Ward, Tokyo 101-8430, Japan.
³ Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ward, Tokyo 113-8657, Japan. Electronic address: shimizu@bi.a.u-tokyo.ac.jp.

PMID: 27889654
DOI: 10.1016/j.compbiolchem.2016.10.009

Abstract

Several methods have been proposed for protein-sugar binding site prediction using machine learning algorithms. However, they are not effective to learn various properties of binding site residues caused by various interactions between proteins and sugars. In this study, we classified sugars into acidic and nonacidic sugars and showed that their binding sites have different amino acid occurrence frequencies. By using this result, we developed sugar-binding residue predictors dedicated to the two classes of sugars: an acid sugar binding predictor and a nonacidic sugar binding predictor. We also developed a combination predictor which combines the results of the two predictors. We showed that when a sugar is known to be an acidic sugar, the acidic sugar binding predictor achieves the best performance, and showed that when a sugar is known to be a nonacidic sugar or is not known to be either of the two classes, the combination predictor achieves the best performance. Our method uses only amino acid sequences for prediction. Support vector machine was used as a machine learning algorithm and the position-specific scoring matrix created by the position-specific iterative basic local alignment search tool was used as the feature vector. We evaluated the performance of the predictors using five-fold cross-validation. We have launched our system, as an open source freeware tool on the GitHub repository (https://doi.org/10.5281/zenodo.61513).

Keywords: Carbohydrate; Machine learning; Sugar-binding proteins; Sugar-binding residue prediction; Support vector machine.

MeSH terms

Binding Sites
Carbohydrates / chemistry*
Cluster Analysis
Proteins / metabolism*
Support Vector Machine*

Substances

Carbohydrates
Proteins