Detecting m6A RNA modification from nanopore sequencing using a semisupervised learning framework

Genome Res. 2024 Nov 20;34(11):1987-1999. doi: 10.1101/gr.278960.124.

Abstract

Direct nanopore-based RNA sequencing can be used to detect posttranscriptional base modifications, such as N6-methyladenosine (m6A) methylation, based on the electric current signals produced by the distinct chemical structures of modified bases. A key challenge is the scarcity of adequate training data with known methylation modifications. We present Xron, a hybrid encoder-decoder framework that delivers a direct methylation-distinguishing basecaller by training on synthetic RNA data and immunoprecipitation (IP)-based experimental data in two steps. First, we generate data with more diverse modification combinations through in silico cross-linking. Second, we use this data set to train an end-to-end neural network basecaller followed by fine-tuning on IP-based experimental data with label smoothing. The trained neural network basecaller outperforms existing methylation detection methods on both read-level and site-level prediction scores. Xron is a standalone, end-to-end m6A-distinguishing basecaller capable of detecting methylated bases directly from raw sequencing signals, enabling de novo methylome assembly.

MeSH terms

  • Adenosine* / analogs & derivatives
  • Adenosine* / metabolism
  • Humans
  • Machine Learning
  • Methylation
  • Nanopore Sequencing* / methods
  • Neural Networks, Computer
  • RNA / chemistry
  • RNA / genetics
  • RNA Methylation
  • RNA Processing, Post-Transcriptional
  • Sequence Analysis, RNA / methods

Substances

  • Adenosine
  • N-methyladenosine
  • RNA