Summary: In whole genome sequencing data, polymerase chain reaction amplification results in duplicate DNA fragments coming from the same location in the genome. The process of preparing a whole genome bisulfite sequencing (WGBS) library, on the other hand, can create two DNA fragments from the same location that should not be considered duplicates. Currently, only one WGBS-aware duplicate marking tool exists. However, it only works with the output from a single tool, does not accept streaming input or output, and requires a substantial amount of memory relative to the input size. Dupsifter provides an aligner-agnostic duplicate marking tool that is lightweight, has streaming capabilities, and is memory efficient.
Availability and implementation: Source code and binaries are freely available at https://github.com/huishenlab/dupsifter under the MIT license. Dupsifter is implemented in C and is supported on macOS and Linux.
© The Author(s) 2023. Published by Oxford University Press.