Context-aware transcript quantification from long-read RNA-seq data with Bambu

Ying Chen; Andre Sim; Yuk Kei Wan; Keith Yeo; Joseph Jing Xian Lee; Min Hao Ling; Michael I Love; Jonathan Göke

doi:10.1038/s41592-023-01908-w

Context-aware transcript quantification from long-read RNA-seq data with Bambu

Nat Methods. 2023 Aug;20(8):1187-1195. doi: 10.1038/s41592-023-01908-w. Epub 2023 Jun 12.

Authors

Ying Chen^#¹, Andre Sim^#¹, Yuk Kei Wan^{1

2}, Keith Yeo¹, Joseph Jing Xian Lee¹, Min Hao Ling¹, Michael I Love^{3

4}, Jonathan Göke^{5

6}

Affiliations

¹ Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore.
² Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Republic of Singapore.
³ Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA.
⁴ Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA.
⁵ Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore. gokej@gis.a-star.edu.sg.
⁶ Department of Statistics and Data Science, National University of Singapore, Singapore, Republic of Singapore. gokej@gis.a-star.edu.sg.

^# Contributed equally.

Abstract

Most approaches to transcript quantification rely on fixed reference annotations; however, the transcriptome is dynamic and depending on the context, such static annotations contain inactive isoforms for some genes, whereas they are incomplete for others. Here we present Bambu, a method that performs machine-learning-based transcript discovery to enable quantification specific to the context of interest using long-read RNA-sequencing. To identify novel transcripts, Bambu estimates the novel discovery rate, which replaces arbitrary per-sample thresholds with a single, interpretable, precision-calibrated parameter. Bambu retains the full-length and unique read counts, enabling accurate quantification in presence of inactive isoforms. Compared to existing methods for transcript discovery, Bambu achieves greater precision without sacrificing sensitivity. We show that context-aware annotations improve quantification for both novel and known transcripts. We apply Bambu to quantify isoforms from repetitive HERVH-LTR7 retrotransposons in human embryonic stem cells, demonstrating the ability for context-specific transcript expression analysis.

Publication types

Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural

MeSH terms

Gene Expression Profiling* / methods
Humans
Protein Isoforms / genetics
RNA-Seq
Sequence Analysis, RNA / methods
Transcriptome*

Substances

Protein Isoforms

Grants and funding

R01 HG009937/HG/NHGRI NIH HHS/United States