Development and implementation of a core genome multilocus sequence typing scheme for Haemophilus influenzae

Made Ananda Krisna; Keith A Jolley; William Monteith; Alexandra Boubour; Raph L Hamers; Angela B Brueggemann; Odile B Harrison; Martin C J Maiden

doi:10.1099/mgen.0.001281

Development and implementation of a core genome multilocus sequence typing scheme for Haemophilus influenzae

Microb Genom. 2024 Aug;10(8):001281. doi: 10.1099/mgen.0.001281.

Authors

Made Ananda Krisna^{1

2

3}, Keith A Jolley², William Monteith^{2

4}, Alexandra Boubour⁵, Raph L Hamers^{1

3}, Angela B Brueggemann⁵, Odile B Harrison^{2

5}, Martin C J Maiden²

Affiliations

¹ Nuffield Department of Medicine, Centre for Tropical Medicine and Global Health, University of Oxford, Oxford, UK.
² Department of Biology, University of Oxford, Oxford, UK.
³ Oxford University Clinical Research Unit Indonesia, Faculty of Medicine Universitas Indonesia, Jakarta, Indonesia.
⁴ Department of Biology and Biochemistry, University of Bath, Bath, UK.
⁵ Nuffield Department of Population Health, University of Oxford, Oxford, UK.

Abstract

Haemophilus influenzae is part of the human nasopharyngeal microbiota and a pathogen causing invasive disease. The extensive genetic diversity observed in H. influenzae necessitates discriminatory analytical approaches to evaluate its population structure. This study developed a core genome multilocus sequence typing (cgMLST) scheme for H. influenzae using pangenome analysis tools and validated the cgMLST scheme using datasets consisting of complete reference genomes (N = 14) and high-quality draft H. influenzae genomes (N = 2297). The draft genome dataset was divided into a development dataset (N = 921) and a validation dataset (N = 1376). The development dataset was used to identify potential core genes, and the validation dataset was used to refine the final core gene list to ensure the reliability of the proposed cgMLST scheme. Functional classifications were made for all the resulting core genes. Phylogenetic analyses were performed using both allelic profiles and nucleotide sequence alignments of the core genome to test congruence, as assessed by Spearman's correlation and ordinary least square linear regression tests. Preliminary analyses using the development dataset identified 1067 core genes, which were refined to 1037 with the validation dataset. More than 70% of core genes were predicted to encode proteins essential for metabolism or genetic information processing. Phylogenetic and statistical analyses indicated that the core genome allelic profile accurately represented phylogenetic relatedness among the isolates (R ² = 0.945). We used this cgMLST scheme to define a high-resolution population structure for H. influenzae, which enhances the genomic analysis of this clinically relevant human pathogen.

Keywords: Haemophilus influenzae; cgMLST; core genome; population genetics; typing scheme.

MeSH terms

Genetic Variation
Genome, Bacterial*
Haemophilus Infections / microbiology
Haemophilus influenzae* / classification
Haemophilus influenzae* / genetics
Humans
Multilocus Sequence Typing* / methods
Phylogeny*

Grants and funding

WT_/Wellcome Trust/United Kingdom