CRBPDL: Identification of circRNA-RBP interaction sites using an ensemble neural network approach

PLoS Comput Biol. 2022 Jan 20;18(1):e1009798. doi: 10.1371/journal.pcbi.1009798. eCollection 2022 Jan.

Abstract

Circular RNAs (circRNAs) are non-coding RNAs with a special circular structure produced formed by the reverse splicing mechanism. Increasing evidence shows that circular RNAs can directly bind to RNA-binding proteins (RBP) and play an important role in a variety of biological activities. The interactions between circRNAs and RBPs are key to comprehending the mechanism of posttranscriptional regulation. Accurately identifying binding sites is very useful for analyzing interactions. In past research, some predictors on the basis of machine learning (ML) have been presented, but prediction accuracy still needs to be ameliorated. Therefore, we present a novel calculation model, CRBPDL, which uses an Adaboost integrated deep hierarchical network to identify the binding sites of circular RNA-RBP. CRBPDL combines five different feature encoding schemes to encode the original RNA sequence, uses deep multiscale residual networks (MSRN) and bidirectional gating recurrent units (BiGRUs) to effectively learn high-level feature representations, it is sufficient to extract local and global context information at the same time. Additionally, a self-attention mechanism is employed to train the robustness of the CRBPDL. Ultimately, the Adaboost algorithm is applied to integrate deep learning (DL) model to improve prediction performance and reliability of the model. To verify the usefulness of CRBPDL, we compared the efficiency with state-of-the-art methods on 37 circular RNA data sets and 31 linear RNA data sets. Moreover, results display that CRBPDL is capable of performing universal, reliable, and robust. The code and data sets are obtainable at https://github.com/nmt315320/CRBPDL.git.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Binding Sites / genetics
  • Computational Biology
  • Machine Learning
  • Models, Biological*
  • Neural Networks, Computer*
  • RNA Splicing / genetics
  • RNA, Circular* / chemistry
  • RNA, Circular* / genetics
  • RNA, Circular* / metabolism
  • RNA-Binding Proteins* / chemistry
  • RNA-Binding Proteins* / genetics
  • RNA-Binding Proteins* / metabolism

Substances

  • RNA, Circular
  • RNA-Binding Proteins

Grants and funding

The work (QZ) was supported by the National Natural Science Foundation of China (No. 62131004, No.61922020),the Sichuan Provincial Science Fund for Distinguished Young Scholars (2021JDJQ0025), and the Special Science Foundation of Quzhou (2021D004). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.