TargetM6A: Identifying N6-Methyladenosine Sites From RNA Sequences via Position-Specific Nucleotide Propensities and a Support Vector Machine

IEEE Trans Nanobioscience. 2016 Oct;15(7):674-682. doi: 10.1109/TNB.2016.2599115. Epub 2016 Aug 10.

Abstract

As one of the most ubiquitous post-transcriptional modifications of RNA, N6-methyladenosine ( [Formula: see text]) plays an essential role in many vital biological processes. The identification of [Formula: see text] sites in RNAs is significantly important for both basic biomedical research and practical drug development. In this study, we designed a computational-based method, called TargetM6A, to rapidly and accurately target [Formula: see text] sites solely from the primary RNA sequences. Two new features, i.e., position-specific nucleotide/dinucleotide propensities (PSNP/PSDP), are introduced and combined with the traditional nucleotide composition (NC) feature to formulate RNA sequences. The extracted features are further optimized to obtain a much more compact and discriminative feature subset by applying an incremental feature selection (IFS) procedure. Based on the optimized feature subset, we trained TargetM6A on the training dataset with a support vector machine (SVM) as the prediction engine. We compared the proposed TargetM6A method with existing methods for predicting [Formula: see text] sites by performing stringent jackknife tests and independent validation tests on benchmark datasets. The experimental results show that the proposed TargetM6A method outperformed the existing methods for predicting [Formula: see text] sites and remarkably improved the prediction performances, with MCC = 0.526 and AUC = 0.818. We also provided a user-friendly web server for TargetM6A, which is publicly accessible for academic use at http://csbio.njust.edu.cn/bioinf/TargetM6A.

MeSH terms

  • Adenosine / analogs & derivatives*
  • Adenosine / analysis
  • Adenosine / chemistry
  • Computational Biology / methods*
  • RNA / analysis
  • RNA / chemistry*
  • Saccharomyces cerevisiae / genetics
  • Sequence Analysis, RNA / methods*
  • Support Vector Machine*

Substances

  • RNA
  • N-methyladenosine
  • Adenosine