A universally applicable method of operon map prediction on minimally annotated genomes using conserved genomic context

Martin T Edwards; Stuart C G Rison; Neil G Stoker; Lorenz Wernisch

doi:10.1093/nar/gki634

A universally applicable method of operon map prediction on minimally annotated genomes using conserved genomic context

Nucleic Acids Res. 2005 Jun 7;33(10):3253-62. doi: 10.1093/nar/gki634. Print 2005.

Authors

Martin T Edwards¹, Stuart C G Rison, Neil G Stoker, Lorenz Wernisch

Affiliation

¹ School of Crystallography, Birkbeck College London WC1E 7HX, UK. m.edwards@mail.cryst.bbk.ac.uk

Abstract

An important step in understanding the regulation of a prokaryotic genome is the generation of its transcription unit map. The current strongest operon predictor depends on the distributions of intergenic distances (IGD) separating adjacent genes within and between operons. Unfortunately, experimental data on these distance distributions are limited to Escherichia coli and Bacillus subtilis. We suggest a new graph algorithmic approach based on comparative genomics to identify clusters of conserved genes independent of IGD and conservation of gene order. As a consequence, distance distributions of operon pairs for any arbitrary prokaryotic genome can be inferred. For E.coli, the algorithm predicts 854 conserved adjacent pairs with a precision of 85%. The IGD distribution for these pairs is virtually identical to the E.coli operon pair distribution. Statistical analysis of the predicted pair IGD distribution allows estimation of a genome-specific operon IGD cut-off, obviating the requirement for a training set in IGD-based operon prediction. We apply the method to a representative set of eight genomes, and show that these genome-specific IGD distributions differ considerably from each other and from the distribution in E.coli.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Base Sequence
Chromosome Mapping / methods*
Conserved Sequence
Escherichia coli / genetics
Genome, Bacterial
Genomics / methods*
Operon*

Grants and funding

WT_/Wellcome Trust/United Kingdom