Discovery and Analyses of Caulimovirid-like Sequences in Upland Cotton (Gossypium hirsutum)

Viruses. 2023 Jul 28;15(8):1643. doi: 10.3390/v15081643.

Abstract

Analyses of Illumina-based high-throughput sequencing data generated during characterization of the cotton leafroll dwarf virus population in Mississippi (2020-2022) consistently yielded contigs varying in size (most frequently from 4 to 7 kb) with identical nucleotide content and sharing similarities with reverse transcriptases (RTases) encoded by extant plant pararetroviruses (family Caulimoviridiae). Initial data prompted an in-depth study involving molecular and bioinformatic approaches to characterize the nature and origins of these caulimovirid-like sequences. As a result, here, we report on endogenous viral elements (EVEs) related to extant members of the family Caulimoviridae, integrated into a genome of upland cotton (Gossypium hirsutum), for which we propose the provisional name "endogenous cotton pararetroviral elements" (eCPRVE). Our investigations pinpointed a ~15 kbp-long locus on the A04 chromosome consisting of head-to-head orientated tandem copies located on positive- and negative-sense DNA strands (eCPRVE+ and eCPRVE-). Sequences of the eCPRVE+ comprised nearly complete and slightly decayed genome information, including ORFs coding for the viral movement protein (MP), coat protein (CP), RTase, and transactivator/viroplasm protein (TA). Phylogenetic analyses of major viral proteins suggest that the eCPRVE+ may have been initially derived from a genome of a cognate virus belonging to a putative new genus within the family. Unexpectedly, an identical 15 kb-long locus composed of two eCPRVE copies was also detected in a newly recognized species G. ekmanianum, shedding some light on the relatively recent evolution within the cotton family.

Keywords: Caulimoviridae; cotton; endogenous form; episomal form; genome integration; pararetrovirus; virus.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology*
  • Gossypium*
  • High-Throughput Nucleotide Sequencing
  • Movement
  • Phylogeny

Grants and funding

This research was partially funded by USDA-ARS NACA 58-6066-9-033, Cotton Inc. grant 17-2021, a Special Research Initiative (SRI) of MAFES/Mississippi State University 2021, and the National Cotton Council.