Regulatory genome annotation of 33 insect species

Elife. 2024 Oct 11:13:RP96738. doi: 10.7554/eLife.96738.

Abstract

Annotation of newly sequenced genomes frequently includes genes, but rarely covers important non-coding genomic features such as the cis-regulatory modules-e.g., enhancers and silencers-that regulate gene expression. Here, we begin to remedy this situation by developing a workflow for rapid initial annotation of insect regulatory sequences, and provide a searchable database resource with enhancer predictions for 33 genomes. Using our previously developed SCRMshaw computational enhancer prediction method, we predict over 2.8 million regulatory sequences along with the tissues where they are expected to be active, in a set of insect species ranging over 360 million years of evolution. Extensive analysis and validation of the data provides several lines of evidence suggesting that we achieve a high true-positive rate for enhancer prediction. One, we show that our predictions target specific loci, rather than random genomic locations. Two, we predict enhancers in orthologous loci across a diverged set of species to a significantly higher degree than random expectation would allow. Three, we demonstrate that our predictions are highly enriched for regions of accessible chromatin. Four, we achieve a validation rate in excess of 70% using in vivo reporter gene assays. As we continue to annotate both new tissues and new species, our regulatory annotation resource will provide a rich source of data for the research community and will have utility for both small-scale (single gene, single species) and large-scale (many genes, many species) studies of gene regulation. In particular, the ability to search for functionally related regulatory elements in orthologous loci should greatly facilitate studies of enhancer evolution even among distantly related species.

Keywords: D. melanogaster; chromosomes; cis-regulation; enhancer prediction; enhancers; gene expression; genetics; genome annotation; genomics; insects; regulatory genomics.

MeSH terms

  • Animals
  • Computational Biology / methods
  • Databases, Genetic
  • Enhancer Elements, Genetic / genetics
  • Genome, Insect* / genetics
  • Insecta* / classification
  • Insecta* / genetics
  • Molecular Sequence Annotation*

Associated data

  • Dryad/10.5061/dryad.3j9kd51t0
  • GEO/GSE101827
  • GEO/GSE38727
  • GEO/GSE118240
  • GEO/GSE104495
  • GEO/GSE152924