MTBseq: a comprehensive pipeline for whole genome sequence analysis of Mycobacterium tuberculosis complex isolates

PeerJ. 2018 Nov 13:6:e5895. doi: 10.7717/peerj.5895. eCollection 2018.

Abstract

Analyzing whole-genome sequencing data of Mycobacterium tuberculosis complex (MTBC) isolates in a standardized workflow enables both comprehensive antibiotic resistance profiling and outbreak surveillance with highest resolution up to the identification of recent transmission chains. Here, we present MTBseq, a bioinformatics pipeline for next-generation genome sequence data analysis of MTBC isolates. Employing a reference mapping based workflow, MTBseq reports detected variant positions annotated with known association to antibiotic resistance and performs a lineage classification based on phylogenetic single nucleotide polymorphisms (SNPs). When comparing multiple datasets, MTBseq provides a joint list of variants and a FASTA alignment of SNP positions for use in phylogenomic analysis, and identifies groups of related isolates. The pipeline is customizable, expandable and can be used on a desktop computer or laptop without any internet connection, ensuring mobile usage and data security. MTBseq and accompanying documentation is available from https://github.com/ngs-fzb/MTBseq_source.

Keywords: Antibiotic resistance profiling; Automated analysis pipeline; Bacterial epidemiology; Bacterial genome analysis; Mycobacterium tuberculosis complex; Next-generation sequencing; Phylogeny; Whole-genome sequencing.

Grants and funding

Parts of this work were funded by the European Community’s Seventh Framework Program (FP7/2007-2013) under grant agreement 278864 in the framework of the Patho-NGen-Trace project and the German Center for Infection Research (DZIF). The publication of this article was funded by the Open Access Fund of the Leibniz Association. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.