Prospects for a sequence-based taxonomy of influenza A virus subtypes

Virus Evol. 2024 Aug 17;10(1):veae064. doi: 10.1093/ve/veae064. eCollection 2024.

Abstract

Hemagglutinin (HA) and neuraminidase (NA) proteins are the primary antigenic targets of influenza A virus (IAV) infections. IAV infections are generally classified into subtypes of HA and NA proteins, e.g. H3N2. Most of the known subtypes were originally defined by a lack of antibody cross-reactivity. However, genetic sequencing has played an increasingly important role in characterizing the evolving diversity of IAV. Novel subtypes have recently been described solely by their genetic sequences, and IAV infections are routinely subtyped by molecular assays, or the comparison of sequences to references. In this study, I carry out a comparative analysis of all available IAV protein sequences in the Genbank database (over 1.1 million, reduced to 272,292 unique sequences prior to phylogenetic reconstruction) to determine whether the serologically defined subtypes can be reproduced with sequence-based criteria. I show that a robust genetic taxonomy of HA and NA subtypes can be obtained using a simple clustering method, namely, by progressively partitioning the phylogeny on its longest internal branches. However, this taxonomy also requires some amendments to the current nomenclature. For example, two IAV isolates from bats previously characterized as a divergent lineage of H9N2 should be separated into their own subtype. With the exception of these small and highly divergent lineages, the phylogenies relating each of the other six genomic segments do not support partitions into major subtypes.