Protein clustering and RNA phylogenetic reconstruction of the influenza A [corrected] virus NS1 protein allow an update in classification and identification of motif conservation

PLoS One. 2013 May 7;8(5):e63098. doi: 10.1371/journal.pone.0063098. Print 2013.

Abstract

The non-structural protein 1 (NS1) of influenza A virus (IAV), coded by its third most diverse gene, interacts with multiple molecules within infected cells. NS1 is involved in host immune response regulation and is a potential contributor to the virus host range. Early phylogenetic analyses using 50 sequences led to the classification of NS1 gene variants into groups (alleles) A and B. We reanalyzed NS1 diversity using 14,716 complete NS IAV sequences, downloaded from public databases, without host bias. Removal of sequence redundancy and further structured clustering at 96.8% amino acid similarity produced 415 clusters that enhanced our capability to detect distinct subgroups and lineages, which were assigned a numerical nomenclature. Maximum likelihood phylogenetic reconstruction using RNA sequences indicated the previously identified deep branching separating group A from group B, with five distinct subgroups within A as well as two and five lineages within the A4 and A5 subgroups, respectively. Our classification model proposes that sequence patterns in thirteen amino acid positions are sufficient to fit >99.9% of all currently available NS1 sequences into the A subgroups/lineages or the B group. This classification reduces host and virus bias through the prioritization of NS1 RNA phylogenetics over host or virus phenetics. We found significant sequence conservation within the subgroups and lineages with characteristic patterns of functional motifs, such as the differential binding of CPSF30 and crk/crkL or the availability of a C-terminal PDZ-binding motif. To understand selection pressures and evolution acting on NS1, it is necessary to organize the available data. This updated classification may help to clarify and organize the study of NS1 interactions and pathogenic differences and allow the drawing of further functional inferences on sequences in each group, subgroup and lineage rather than on a strain-by-strain basis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adaptor Proteins, Signal Transducing / metabolism
  • Amino Acid Motifs
  • Amino Acid Sequence
  • Amino Acids / metabolism
  • Base Sequence
  • Cluster Analysis
  • Conserved Sequence*
  • Likelihood Functions
  • Molecular Sequence Data
  • Nuclear Proteins / metabolism
  • PDZ Domains
  • Phylogeny*
  • Protein Binding
  • Proto-Oncogene Proteins c-crk / metabolism
  • RNA, Viral / genetics
  • Sumoylation
  • Viral Nonstructural Proteins / chemistry*
  • Viral Nonstructural Proteins / genetics*

Substances

  • Adaptor Proteins, Signal Transducing
  • Amino Acids
  • CRKL protein
  • INS1 protein, influenza virus
  • Nuclear Proteins
  • Proto-Oncogene Proteins c-crk
  • RNA, Viral
  • Viral Nonstructural Proteins

Grants and funding

Project supported by Consejo Nacional de Ciencia y Tecnología grant CB2010-155382 and Instituto De Ciencia Y Tecnologia Del Distrito Federal grant PIFUTP09-281. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.