It would be desirable to have an unambiguous scheme for the typing of Shiga toxin-producing Escherichia coli (STEC) isolates to subpopulations. Such a scheme should take the high genomic plasticity of E. coli into account and utilize the stratification of STEC into subgroups, based on serotype or phylogeny. Therefore, our goal was to identify specific marker combinations for improved classification of STEC subtypes. We developed and evaluated two bioinformatic pipelines for genomic marker identification from larger sets of bacterial genome sequences. Pipeline A performed all-against-all BLASTp analyses of gene products predicted in STEC genome test sets against a set of control genomes. Pipeline B identified STEC marker genes by comparing the STEC core proteome and the "pan proteome" of a non-STEC control group. Both pipelines defined an overlapping, but not identical set of discriminative markers for different STEC subgroups. Differential marker prediction resulted from differences in genome assembly, ORF finding and inclusion cut-offs in both workflows. Based on the output of the pipelines, we defined new specific markers for STEC serogroups and phylogenetic groups frequently associated with outbreaks and cases of foodborne illnesses. These included STEC serogroups O157, O26, O45, O103, O111, O121, and O145, Shiga toxin-positive enteroaggregative E. coli O104:H4, and HUS-associated sequence type (ST)306. We evaluated these STEC marker genes for their presence in whole genome sequence data sets. Based on the identified discriminative markers, we developed a multiplex PCR (mPCR) approach for detection and typing of the targeted STEC. The specificity of the mPCR primer pairs was verified using well-defined clinical STEC isolates as well as isolates from the ECOR, DEC, and HUSEC collections. The application of the STEC mPCR for food analysis was tested with inoculated milk. In summary, we evaluated two different strategies to screen large genome sequence data sets for discriminative markers and implemented novel marker genes found in this genome-wide approach into a DNA-based typing tool for STEC that can be used for the characterization of STEC from clinical and food samples.
Keywords: O157; STEC; comparative genomics; multiplex PCR; non-O157.