SARS-CoV-2 Genomic Diversity in Households Highlights the Challenges of Sequence-Based Transmission Inference

mSphere. 2022 Dec 21;7(6):e0040022. doi: 10.1128/msphere.00400-22. Epub 2022 Nov 15.

Abstract

The reliability of sequence-based inference of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission is not clear. Sequence data from infections among household members can define the expected genomic diversity of a virus along a defined transmission chain. SARS-CoV-2 cases were identified prospectively among 2,369 participants in 706 households. Specimens with a reverse transcription-PCR cycle threshold of ≤30 underwent whole-genome sequencing. Intrahost single-nucleotide variants (iSNV) were identified at a ≥5% frequency. Phylogenetic trees were used to evaluate the relationship of household and community sequences. There were 178 SARS-CoV-2 cases in 706 households. Among 147 specimens sequenced, 106 yielded a whole-genome consensus with coverage suitable for identifying iSNV. Twenty-six households had sequences from multiple cases within 14 days. Consensus sequences were indistinguishable among cases in 15 households, while 11 had ≥1 consensus sequence that differed by 1 to 2 mutations. Sequences from households and the community were often interspersed on phylogenetic trees. Identification of iSNV improved inference in 2 of 15 households with indistinguishable consensus sequences and in 6 of 11 with distinct ones. In multiple-infection households, whole-genome consensus sequences differed by 0 to 1 mutations. Identification of shared iSNV occasionally resolved linkage, but the low genomic diversity of SARS-CoV-2 limits the utility of "sequence-only" transmission inference. IMPORTANCE We performed whole-genome sequencing of SARS-CoV-2 from prospectively identified cases in three longitudinal household cohorts. In a majority of multi-infection households, SARS-CoV-2 consensus sequences were indistinguishable, and they differed by 1 to 2 mutations in the rest. Importantly, even with modest genomic surveillance of the community (3 to 5% of cases sequenced), it was not uncommon to find community sequences interspersed with household sequences on phylogenetic trees. Identification of shared minority variants only occasionally resolved these ambiguities in transmission linkage. Overall, the low genomic diversity of SARS-CoV-2 limits the utility of "sequence-only" transmission inference. Our work highlights the need to carefully consider both epidemiologic linkage and sequence data to define transmission chains in households, hospitals, and other transmission settings.

Keywords: SARS-CoV-2; genomic epidemiology; household; transmission.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • COVID-19*
  • Genome, Viral
  • Genomics
  • Humans
  • Phylogeny
  • Reproducibility of Results
  • SARS-CoV-2* / genetics