Prediction of Neuropeptides from Sequence Information Using Ensemble Classifier and Hybrid Features

J Proteome Res. 2020 Sep 4;19(9):3732-3740. doi: 10.1021/acs.jproteome.0c00276. Epub 2020 Aug 14.

Abstract

As hormones in the endocrine system and neurotransmitters in the immune system, neuropeptides (NPs) provide many opportunities for the discovery of new drugs and targets for nervous system disorders. In spite of their importance in the hormonal regulations and immune responses, the bioinformatics predictor for the identification of NPs is lacking. In this study, we develop a predictor for the identification of NPs, named PredNeuroP, based on a two-layer stacking method. In this ensemble predictor, 45 models are introduced as base-learners by combining nine feature descriptors with five machine learning algorithms. Then, we select eight base-learners referring to the sum of accuracy and Pearson correlation coefficient of base-learner pairs on the first-layer learning. On the second-layer learning, the outputs of these advisable base-learners are imported into logistic regression classifier to train the final model, and the outputs are the final predicting results. The accuracy of PredNeuroP is 0.893 and 0.872 on the training and test data sets, respectively. The consistent performance on these data sets approves the practicability of our predictor. Therefore, we expect that PredNeuroP would provide an important advancement in the discovery of NPs as new drugs for the treatment of nervous system disorders. The data sets and Python code are available at https://github.com/xialab-ahu/PredNeuroP.

Keywords: Pearson correlation coefficient; machine learning; neuropeptide; stacking method.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology
  • Machine Learning*
  • Neuropeptides* / genetics

Substances

  • Neuropeptides