A computational system for identifying operons based on RNA-seq data

Methods. 2020 Apr 1:176:62-70. doi: 10.1016/j.ymeth.2019.03.026. Epub 2019 Apr 4.

Abstract

An operon is a set of neighboring genes in a genome that is transcribed as a single polycistronic message. Genes that are part of the same operon often have related functional roles or participate in the same metabolic pathways. The majority of all bacterial genes are co-transcribed with one or more other genes as part of a multi-gene operon. Thus, accurate identification of operons is important in understanding co-regulation of genes and their functional relationships. Here, we present a computational system that uses RNA-seq data to determine operons throughout a genome. The system takes the name of a genome and one or more files of RNA-seq data as input. Our method combines primary genomic sequence information with expression data from the RNA-seq files in a unified probabilistic model in order to identify operons. We assess our method's ability to accurately identify operons in a range of species through comparison to external databases of operons, both experimentally confirmed and computationally predicted, and through focused experiments that confirm new operons identified by our method. Our system is freely available at https://cs.wellesley.edu/~btjaden/Rockhopper/.

Keywords: Bacteria; Bioinformatics; Operon; Polycistronic; RNA-seq; Transcription.

Publication types

  • Research Support, N.I.H., Extramural
  • Review

MeSH terms

  • Gene Regulatory Networks
  • Genome, Bacterial / genetics*
  • Genomics / methods*
  • Models, Genetic
  • Operon / genetics*
  • RNA-Seq / methods*
  • Transcription, Genetic