Disulfide Connectivity Prediction Based on Modelled Protein 3D Structural Information and Random Forest Regression

IEEE/ACM Trans Comput Biol Bioinform. 2015 May-Jun;12(3):611-21. doi: 10.1109/TCBB.2014.2359451.

Abstract

Disulfide connectivity is an important protein structural characteristic. Accurately predicting disulfide connectivity solely from protein sequence helps to improve the intrinsic understanding of protein structure and function, especially in the post-genome era where large volume of sequenced proteins without being functional annotated is quickly accumulated. In this study, a new feature extracted from the predicted protein 3D structural information is proposed and integrated with traditional features to form discriminative features. Based on the extracted features, a random forest regression model is performed to predict protein disulfide connectivity. We compare the proposed method with popular existing predictors by performing both cross-validation and independent validation tests on benchmark datasets. The experimental results demonstrate the superiority of the proposed method over existing predictors. We believe the superiority of the proposed method benefits from both the good discriminative capability of the newly developed features and the powerful modelling capability of the random forest. The web server implementation, called TargetDisulfide, and the benchmark datasets are freely available at: http://csbio.njust.edu.cn/bioinf/TargetDisulfide for academic use.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Computational Biology / methods*
  • Decision Trees
  • Disulfides / chemistry*
  • Models, Molecular*
  • Protein Conformation*
  • Proteins / chemistry*
  • Regression Analysis
  • Sequence Analysis, Protein

Substances

  • Disulfides
  • Proteins