Dupsifter: a lightweight duplicate marking tool for whole genome bisulfite sequencing

Bioinformatics. 2023 Dec 1;39(12):btad729. doi: 10.1093/bioinformatics/btad729.

Abstract

Summary: In whole genome sequencing data, polymerase chain reaction amplification results in duplicate DNA fragments coming from the same location in the genome. The process of preparing a whole genome bisulfite sequencing (WGBS) library, on the other hand, can create two DNA fragments from the same location that should not be considered duplicates. Currently, only one WGBS-aware duplicate marking tool exists. However, it only works with the output from a single tool, does not accept streaming input or output, and requires a substantial amount of memory relative to the input size. Dupsifter provides an aligner-agnostic duplicate marking tool that is lightweight, has streaming capabilities, and is memory efficient.

Availability and implementation: Source code and binaries are freely available at https://github.com/huishenlab/dupsifter under the MIT license. Dupsifter is implemented in C and is supported on macOS and Linux.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • DNA / genetics
  • DNA Methylation*
  • Sequence Analysis, DNA / methods
  • Software
  • Sulfites*
  • Whole Genome Sequencing / methods

Substances

  • hydrogen sulfite
  • Sulfites
  • DNA