A Hybrid Deep Learning Model for Predicting Protein Hydroxylation Sites

Int J Mol Sci. 2018 Sep 18;19(9):2817. doi: 10.3390/ijms19092817.

Abstract

Protein hydroxylation is one type of post-translational modifications (PTMs) playing critical roles in human diseases. It is known that protein sequence contains many uncharacterized residues of proline and lysine. The question that needs to be answered is: which residue can be hydroxylated, and which one cannot. The answer will not only help understand the mechanism of hydroxylation but can also benefit the development of new drugs. In this paper, we proposed a novel approach for predicting hydroxylation using a hybrid deep learning model integrating the convolutional neural network (CNN) and long short-term memory network (LSTM). We employed a pseudo amino acid composition (PseAAC) method to construct valid benchmark datasets based on a sliding window strategy and used the position-specific scoring matrix (PSSM) to represent samples as inputs to the deep learning model. In addition, we compared our method with popular predictors including CNN, iHyd-PseAAC, and iHyd-PseCp. The results for 5-fold cross-validations all demonstrated that our method significantly outperforms the other methods in prediction accuracy.

Keywords: convolutional neural network (CNN); hydroxylation sites; iHyd-PseAAC; iHyd-PseCp; long short-term memory network (LSTM); protein post-translational modification (PTM).

MeSH terms

  • Deep Learning*
  • Humans
  • Hydroxylation
  • Hydroxylysine / chemistry*
  • Hydroxylysine / metabolism
  • Hydroxyproline / chemistry*
  • Hydroxyproline / metabolism
  • Models, Biological
  • Neural Networks, Computer
  • Protein Processing, Post-Translational
  • Proteins / chemistry*
  • Proteins / metabolism

Substances

  • Proteins
  • Hydroxylysine
  • Hydroxyproline