TERIUS: accurate prediction of lncRNA via high-throughput sequencing data representing RNA-binding protein association

BMC Bioinformatics. 2018 Feb 19;19(Suppl 1):41. doi: 10.1186/s12859-018-2013-9.

Abstract

Background: LncRNAs are long regulatory non-coding RNAs, some of which are arguably predicted to have coding potential. Despite coding potential classifiers that utilize ribosome profiling data successfully detected actively translated regions, they are less sensitive to lncRNAs. Furthermore, lncRNA annotation can be susceptible to false positives obtained from 3' untranslated region (UTR) fragments of mRNAs.

Results: To lower these limitations in lncRNA annotation, we present a novel tool TERIUS that provides a two-step filtration process to distinguish between bona fide and false lncRNAs. The first step successfully separates lncRNAs from protein-coding genes showing enhanced sensitivity compared to other methods. To eliminate 3'UTR fragments, the second step takes advantage of the 3'UTR-specific association with regulator of nonsense transcripts 1 (UPF1), leading to refined lncRNA annotation. Importantly, TERIUS enabled the detection of misclassified transcripts in published lncRNA annotations.

Conclusions: TERIUS is a robust method for lncRNA annotation, which provides an additional filtration step for 3'UTR fragments. TERIUS was able to successfully re-classify GENCODE and miTranscriptome lncRNA annotations. We believe that TERIUS can benefit construction of extensive and accurate non-coding transcriptome maps in many genomes.

Keywords: LncRNA; LncRNA annotation; RNA binding protein association.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • 3' Untranslated Regions
  • Animals
  • Gene Expression Profiling
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Mice
  • Molecular Sequence Annotation*
  • RNA Helicases / metabolism
  • RNA, Long Noncoding / chemistry*
  • RNA-Binding Proteins / metabolism
  • Sequence Analysis, RNA*
  • Software*
  • Trans-Activators / metabolism

Substances

  • 3' Untranslated Regions
  • RNA, Long Noncoding
  • RNA-Binding Proteins
  • Trans-Activators
  • RNA Helicases
  • UPF1 protein, human