Identification of Human Enzymes Using Amino Acid Composition and the Composition of k-Spaced Amino Acid Pairs

Biomed Res Int. 2020 May 22:2020:9235920. doi: 10.1155/2020/9235920. eCollection 2020.

Abstract

Enzymes are proteins that can efficiently catalyze specific biochemical reactions, and they are widely present in the human body. Developing an efficient method to identify human enzymes is vital to select enzymes from the vast number of human proteins and to investigate their functions. Nevertheless, only a limited amount of research has been conducted on the classification of human enzymes and nonenzymes. In this work, we developed a support vector machine- (SVM-) based predictor to classify human enzymes using the amino acid composition (AAC), the composition of k-spaced amino acid pairs (CKSAAP), and selected informative amino acid pairs through the use of a feature selection technique. A training dataset including 1117 human enzymes and 2099 nonenzymes and a test dataset including 684 human enzymes and 1270 nonenzymes were constructed to train and test the proposed model. The results of jackknife cross-validation showed that the overall accuracy was 76.46% for the training set and 76.21% for the test set, which are higher than the 72.6% achieved in previous research. Furthermore, various feature extraction methods and mainstream classifiers were compared in this task, and informative feature parameters of k-spaced amino acid pairs were selected and compared. The results suggest that our classifier can be used in human enzyme identification effectively and efficiently and can help to understand their functions and develop new drugs.

MeSH terms

  • Algorithms
  • Amino Acids / chemistry*
  • Computational Biology
  • Databases, Protein
  • Enzymes / chemistry*
  • Enzymes / classification
  • Humans
  • Proteins / chemistry*
  • Proteins / classification
  • Support Vector Machine

Substances

  • Amino Acids
  • Enzymes
  • Proteins