RabbitV: fast detection of viruses and microorganisms in sequencing data on multi-core architectures

Bioinformatics. 2022 May 13;38(10):2932-2933. doi: 10.1093/bioinformatics/btac187.

Abstract

Motivation: Detection and identification of viruses and microorganisms in sequencing data plays an important role in pathogen diagnosis and research. However, existing tools for this problem often suffer from high runtimes and memory consumption.

Results: We present RabbitV, a tool for rapid detection of viruses and microorganisms in Illumina sequencing datasets based on fast identification of unique k-mers. It can exploit the power of modern multi-core CPUs by using multi-threading, vectorization and fast data parsing. Experiments show that RabbitV outperforms fastv by a factor of at least 42.5 and 14.4 in unique k-mer generation (RabbitUniq) and pathogen identification (RabbitV), respectively. Furthermore, RabbitV is able to detect COVID-19 from 40 samples of sequencing data (255 GB in FASTQ format) in only 320 s.

Availability and implementation: RabbitUniq and RabbitV are available at https://github.com/RabbitBio/RabbitUniq and https://github.com/RabbitBio/RabbitV.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • COVID-19*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Sequence Analysis, DNA
  • Software
  • Viruses* / genetics