SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues

PLoS One. 2015 Jul 15;10(7):e0133260. doi: 10.1371/journal.pone.0133260. eCollection 2015.

Abstract

Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Sequence
  • Amino Acids / metabolism*
  • Computational Biology / methods*
  • Databases, Protein
  • Internet
  • Machine Learning
  • Models, Molecular
  • Protein Binding
  • Reproducibility of Results
  • Sequence Analysis, Protein*
  • Structural Homology, Protein
  • Templates, Genetic

Substances

  • Amino Acids

Grants and funding

This work was supported by the National Natural Science Foundation of China (31301091) and the Fundamental Research Funds for the Central Universities (2014JC003). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.