Breaking free from references: a consensus-based approach for community profiling with long amplicon nanopore data

Brief Bioinform. 2024 Nov 22;26(1):bbae642. doi: 10.1093/bib/bbae642.

Abstract

Third-generation sequencing platforms, such as Oxford Nanopore Technology (ONT), have made it possible to characterize communities through the sequencing of long amplicons. While this theoretically allows for an increased taxonomic resolution compared to short-read sequencing platforms such as Illumina, the high error rate remains problematic for accurately identifying the community members present within a sample. Here, we present and validate CONCOMPRA, a tool that allows the detection of closely related strains within a community by drafting and mapping to consensus sequences. We show that CONCOMPRA outperforms several other tools for profiling bacterial communities using full-length 16S rRNA gene sequencing. Since CONCOMPRA does not rely on a sequence database for profiling communities, it is applicable to systems and amplicons for which little to no reference data exists. Our validation test shows that the amplification of long PCR products is likely to produce chimeric byproducts that inflate alpha diversity and skew community structure, stressing the importance of chimera detection. CONCOMPRA is available on GitHub (https://github.com/willem-stock/CONCOMPRA).

Keywords: Oxford nanopore technology; amplicon sequencing; chimera; consensus sequence.

MeSH terms

  • Bacteria / classification
  • Bacteria / genetics
  • Consensus Sequence
  • High-Throughput Nucleotide Sequencing / methods
  • Microbiota / genetics
  • Nanopore Sequencing / methods
  • Nanopores*
  • RNA, Ribosomal, 16S* / genetics
  • Sequence Analysis, DNA / methods
  • Software

Substances

  • RNA, Ribosomal, 16S