Identification of the conjugative and mobilizable plasmid fragments in the plasmidome using sequence signatures

Microb Genom. 2020 Nov;6(11):mgen000459. doi: 10.1099/mgen.0.000459. Epub 2020 Oct 19.

Abstract

Plasmids are the key element in horizontal gene transfer in the microbial community. Recently, a large number of experimental and computational methods have been developed to obtain the plasmidomes of microbial communities. Distinguishing transmissible plasmid sequences, which are derived from conjugative or at least mobilizable plasmids, from non-transmissible plasmid sequences in the plasmidome is essential for understanding the diversity of plasmids and how they regulate the microbial community. Unfortunately, due to the highly fragmented characteristics of DNA sequences in the plasmidome, effective identification methods are lacking. In this work, we used information entropy from information theory to assess the randomness of synonymous codon usage over 4424 plasmid genomes. The results showed that for all amino acids, the choice of a synonymous codon in conjugative and mobilizable plasmids is more random than that in non-transmissible plasmids, indicating that transmissible plasmids have different sequence signatures from non-transmissible plasmids. Inspired by this phenomenon, we further developed a novel algorithm named PlasTrans. PlasTrans takes the triplet code sequences and base sequences of plasmid DNA fragments as input and uses the convolutional neural network of the deep learning technique to further extract the more complex signatures of the plasmid sequences and identify the conjugative and mobilizable DNA fragments. Tests showed that PlasTrans could achieve an AUC of as high as 84-91%, even though the fragments only contained hundreds of base pairs. To the best of our knowledge, this is the first quantitative analysis of the difference in sequence signatures between transmissible and non-transmissible plasmids, and we developed the first tool to perform transferability annotation for DNA fragments in the plasmidome. We expect that PlasTrans will be a useful tool for researchers who analyse the properties of novel plasmids in the microbial community and horizontal gene transfer, especially the spread of resistance genes and virulence factors associated with plasmids. PlasTrans is freely available via https://github.com/zhenchengfang/PlasTrans.

Keywords: deep learning; information theory; plasmid transmissibility; plasmidome; sequence signatures.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacteria / genetics*
  • Base Sequence / genetics
  • Computational Biology / methods*
  • DNA, Bacterial / genetics
  • Gene Transfer, Horizontal / genetics*
  • Genome, Bacterial / genetics
  • Microbiota / genetics
  • Plasmids / genetics*
  • Sequence Analysis, DNA

Substances

  • DNA, Bacterial