Deepbinner: Demultiplexing barcoded Oxford Nanopore reads with deep convolutional neural networks

PLoS Comput Biol. 2018 Nov 20;14(11):e1006583. doi: 10.1371/journal.pcbi.1006583. eCollection 2018 Nov.

Abstract

Multiplexing, the simultaneous sequencing of multiple barcoded DNA samples on a single flow cell, has made Oxford Nanopore sequencing cost-effective for small genomes. However, it depends on the ability to sort the resulting sequencing reads by barcode, and current demultiplexing tools fail to classify many reads. Here we present Deepbinner, a tool for Oxford Nanopore demultiplexing that uses a deep neural network to classify reads based on the raw electrical read signal. This 'signal-space' approach allows for greater accuracy than existing 'base-space' tools (Albacore and Porechop) for which signals must first be converted to DNA base calls, itself a complex problem that can introduce noise into the barcode sequence. To assess Deepbinner and existing tools, we performed multiplex sequencing on 12 amplicons chosen for their distinguishability. This allowed us to establish a ground truth classification for each read based on internal sequence alone. Deepbinner had the lowest rate of unclassified reads (7.8%) and the highest demultiplexing precision (98.5% of classified reads were correctly assigned). It can be used alone (to maximise the number of classified reads) or in conjunction with other demultiplexers (to maximise precision and minimise false positive classifications). We also found cross-sample chimeric reads (0.3%) and evidence of barcode switching (0.3%) in our dataset, which likely arise during library preparation and may be detrimental for quantitative studies that use multiplexing. Deepbinner is open source (GPLv3) and available at https://github.com/rrwick/Deepbinner.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Bacteria / genetics
  • Computational Biology / methods*
  • DNA / analysis
  • DNA Barcoding, Taxonomic*
  • Electronic Data Processing
  • Gene Library
  • Genome
  • High-Throughput Nucleotide Sequencing
  • Nanopores
  • Nanotechnology / methods*
  • Neural Networks, Computer*
  • Reproducibility of Results
  • Software

Substances

  • DNA

Grants and funding

This work was supported by the Bill and Melinda Gates Foundation, Seattle (grant number OPP1175797). KEH is a Viertel Foundation of Australia Senior Medical Research Fellow. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.