Evolution of SARS-CoV-2 genome from December 2019 to late March 2020: Emerged haplotypes and informative Tag nucleotide variations

J Med Virol. 2021 Apr;93(4):2010-2020. doi: 10.1002/jmv.26553. Epub 2020 Nov 1.

Abstract

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causes serious disease in humans. First identified in November/December 2019 in China, it has rapidly spread worldwide. We analyzed 2790 SARS-CoV-2 genome sequences from 56 countries that were available on April 2, 2020, to assess the evolution of the virus during this early phase of its expansion. We aimed to assess sequence variations that had evolved in virus genomes, giving the greatest attention to the S gene. We also aimed to identify haplotypes that the variations may define and consider their geographic and chronologic distribution. Variations at 1930 positions that together cause 1203 amino acid changes were identified. The frequencies of changes normalized to the lengths of genes and encoded proteins were relatively high in ORF3a and relatively low in M. A variation that causes an Asp614Gly near the receptor-binding domain of S were found at a high frequency, and it was considered that this may contribute to the rapid spread of viruses with this variation. Our most important findings relate to haplotypes. Sixty-six haplotypes that constitute thirteen haplotype groups (H1-H13) were identified, and 84.6% of the 2790 sequences analyzed were associated with these haplotypes. The majority of the sequences (75.1%) were associated with haplotype groups H1-H3. The distribution pattern of the haplotype groups differed in various geographic regions. A few were country/territory specific. The location and time of emergence of some haplotypes are discussed. Importantly, nucleotide variations that define the various haplotypes and Tag/signature variations for most of the haplotypes are reported. The practical applications of these variations are discussed.

Keywords: SARS-CoV-2; Tag SNVs; amino acid changes; haplotypes; nucleotide variations.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19 / virology*
  • Evolution, Molecular
  • Genetic Variation*
  • Genome, Viral*
  • Haplotypes
  • Humans
  • Phylogeography
  • SARS-CoV-2 / genetics*
  • Spike Glycoprotein, Coronavirus / genetics*

Substances

  • Spike Glycoprotein, Coronavirus
  • spike protein, SARS-CoV-2