Requirement or exclusion of inverted repeat sequences with cruciform-forming potential in Escherichia coli revealed by genome-wide analyses

Curr Genet. 2018 Aug;64(4):945-958. doi: 10.1007/s00294-018-0815-y. Epub 2018 Feb 27.

Abstract

Inverted repeat (IR) sequences are DNA sequences that read the same from 5' to 3' in each strand. Some IRs can form cruciforms under the stress of negative supercoiling, and these IRs are widely found in genomes. However, their biological significance remains unclear. The aim of the current study is to explore this issue further. We constructed the first Escherichia coli genome-wide comprehensive map of IRs with cruciform-forming potential. Based on the map, we performed detailed and quantitative analyses. Here, we report that IRs with cruciform-forming potential are statistically enriched in the following five regions: the adjacent regions downstream of the stop codon-coding sites (referred to as the stop codons), on and around the positions corresponding to mRNA ends (referred to as the gene ends), ~ 20 to ~45 bp upstream of the start codon-coding sites (referred to as the start codons) within the 5'-UTR (untranslated region), ~ 25 to ~ 60 bp downstream of the start codons, and promoter regions. For the adjacent regions downstream of the stop codons and on and around the gene ends, most of the IRs with a repeat unit length of ≥ 8 bp and a spacer size of ≤ 8 bp were parts of the intrinsic terminators, regardless of the location, and presumably used for Rho-independent transcription termination. In contrast, fewer IRs were present in the small region preceding the start codons. In E. coli, IRs with cruciform-forming potential are actively placed or excluded in the regulatory regions for the initiation and termination of transcription and translation, indicating their deep involvement or influence in these processes.

Keywords: Cruciform; E. coli; Genome-wide distribution; Intrinsic terminator; Inverted repeat (IR) sequence.

MeSH terms

  • 5' Untranslated Regions / genetics
  • Base Sequence / genetics
  • Codon, Terminator / genetics
  • DNA, Cruciform / genetics*
  • Escherichia coli / genetics*
  • Genome, Bacterial / genetics*
  • Inverted Repeat Sequences / genetics*

Substances

  • 5' Untranslated Regions
  • Codon, Terminator
  • DNA, Cruciform

Grants and funding