A deep siamese neural network improves metagenome-assembled genomes in microbiome datasets across different environments

Nat Commun. 2022 Apr 28;13(1):2326. doi: 10.1038/s41467-022-29843-y.

Abstract

Metagenomic binning is the step in building metagenome-assembled genomes (MAGs) when sequences predicted to originate from the same genome are automatically grouped together. The most widely-used methods for binning are reference-independent, operating de novo and enable the recovery of genomes from previously unsampled clades. However, they do not leverage the knowledge in existing databases. Here, we introduce SemiBin, an open source tool that uses deep siamese neural networks to implement a semi-supervised approach, i.e. SemiBin exploits the information in reference genomes, while retaining the capability of reconstructing high-quality bins that are outside the reference dataset. Using simulated and real microbiome datasets from several different habitats from GMGCv1 (Global Microbial Gene Catalog), including the human gut, non-human guts, and environmental habitats (ocean and soil), we show that SemiBin outperforms existing state-of-the-art binning methods. In particular, compared to other methods, SemiBin returns more high-quality bins with larger taxonomic diversity, including more distinct genera and species.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Metagenome* / genetics
  • Metagenomics / methods
  • Microbiota* / genetics
  • Neural Networks, Computer