Identifying N6-methyladenosine sites using extreme gradient boosting system optimized by particle swarm optimizer

J Theor Biol. 2019 Apr 21:467:39-47. doi: 10.1016/j.jtbi.2019.01.035. Epub 2019 Jan 31.

Abstract

N6-methyladenosine (m6A) is the one of the most important RNA modifications, playing the role of splicing events, mRNA exporting and stability to cell differentiation. Because of wide distribution of m6A in genes, identification of m6A sites in RNA sequences has significant importance for basic biomedical research and drug development. High-throughput laboratory methods are time consuming and costly. Nowadays, effective computational methods are much desirable because of its convenience and fast speed. Thus, in this article, we proposed a new method to improve the performance of the m6A prediction by using the combined features of deep features and original features with extreme gradient boosting optimized by particle swarm optimization (PXGB). The proposed PXGB algorithm uses three kinds of features, i.e., position-specific nucleotide propensity (PSNP), position-specific dinucleotide propensity (PSDP), and the traditional nucleotide composition (NC). By 10-fold cross validation, the performance of PXGB was measured with an AUC of 0.8390 and an MCC of 0.5234. Additionally, PXGB was compared with the existing methods, and the higher MCC and AUC of PXGB demonstrated that PXGB was effective to predict m6A sites. The predictor proposed in this study might help to predict more m6A sites and guide related experimental validation.

Keywords: Extreme gradient boosting; M(6)A sites; N(6)-methyladenosine; Particle swarm optimization; XGBoost.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenosine / analogs & derivatives*
  • Adenosine / analysis
  • Algorithms
  • Animals
  • Area Under Curve
  • Base Sequence / genetics*
  • Computational Biology / methods*
  • Humans

Substances

  • N-methyladenosine
  • Adenosine