Identification of Unannotated Small Genes in Salmonella

G3 (Bethesda). 2017 Mar 10;7(3):983-989. doi: 10.1534/g3.116.036939.

Abstract

Increasing evidence indicates that many, if not all, small genes encoding proteins ≤100 aa are missing in annotations of bacterial genomes currently available. To uncover unannotated small genes in the model bacterium Salmonella enterica Typhimurium 14028s, we used the genomic technique ribosome profiling, which provides a snapshot of all mRNAs being translated (translatome) in a given growth condition. For comprehensive identification of unannotated small genes, we obtained Salmonella translatomes from four different growth conditions: LB, MOPS rich defined medium, and two infection-relevant conditions low Mg2+ (10 µM) and low pH (5.8). To facilitate the identification of small genes, ribosome profiling data were analyzed in combination with in silico predicted putative open reading frames and transcriptome profiles. As a result, we uncovered 130 unannotated ORFs. Of them, 98% were small ORFs putatively encoding peptides/proteins ≤100 aa, and some of them were only expressed in the infection-relevant low Mg2+ and/or low pH condition. We validated the expression of 25 of these ORFs by western blot, including the smallest, which encodes a peptide of 7 aa residues. Our results suggest that many sequenced bacterial genomes are underannotated with regard to small genes and their gene annotations need to be revised.

Keywords: genome annotation; ribosome profiling; short ORF; small genes; small proteins.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Expression Regulation, Bacterial
  • Genes, Bacterial*
  • Molecular Sequence Annotation*
  • Open Reading Frames / genetics
  • Reproducibility of Results
  • Salmonella / genetics*