Generation and analysis of a mouse multitissue genome annotation atlas

Genome Res. 2024 Nov 20;34(11):2108-2117. doi: 10.1101/gr.279217.124.

Abstract

Generating an accurate and complete genome annotation for an organism is complex because the cells within each tissue can express a unique set of transcript isoforms from a unique set of genes. A comprehensive genome annotation should contain information on what tissues express what transcript isoforms at what level. This tissue-level isoform information can then inform a wide range of research questions as well as experiment designs. Long-read sequencing technology combined with advanced full-length cDNA library preparation methods has now achieved throughput and accuracy where generating these types of annotations is achievable. Here, we show this by generating a genome annotation of the mouse (Mus musculus). We used the nanopore-based R2C2 long-read sequencing method to generate 64 million highly accurate full-length cDNA consensus reads-averaging 5.4 million reads per tissue for a dozen tissues. Using the Mandalorion tool, we processed these reads to generate the Tissue-level Atlas of Mouse Isoforms which is available as a trackhub for the UCSC Genome Browser and contains at least one full-length isoform for the vast majority of expressed genes in each tissue.

MeSH terms

  • Animals
  • Genome*
  • High-Throughput Nucleotide Sequencing
  • Mice
  • Molecular Sequence Annotation*
  • Organ Specificity
  • Protein Isoforms / genetics

Substances

  • Protein Isoforms