Bioinformatics pipeline using JUDI: Just Do It!

Soumitra Pal; Teresa M Przytycka

doi:10.1093/bioinformatics/btz956

Bioinformatics pipeline using JUDI: Just Do It!

Bioinformatics. 2020 Apr 15;36(8):2572-2574. doi: 10.1093/bioinformatics/btz956.

Authors

Soumitra Pal¹, Teresa M Przytycka¹

Affiliation

¹ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

Abstract

Summary: Large-scale data analysis in bioinformatics requires pipelined execution of multiple software. Generally each stage in a pipeline takes considerable computing resources and several workflow management systems (WMS), e.g. Snakemake, Nextflow, Common Workflow Language, Galaxy, etc. have been developed to ensure optimum execution of the stages across two invocations of the pipeline. However, when the pipeline needs to be executed with different settings of parameters, e.g. thresholds, underlying algorithms, etc. these WMS require significant scripting to ensure an optimal execution. We developed JUDI on top of DoIt, a Python based WMS, to systematically handle parameter settings based on the principles of database management systems. Using a novel modular approach that encapsulates a parameter database in each task and file associated with a pipeline stage, JUDI simplifies plug-and-play of the pipeline stages. For a typical pipeline with n parameters, JUDI reduces the number of lines of scripting required by a factor of O(n). With properly designed parameter databases, JUDI not only enables reproducing research under published values of parameters but also facilitates exploring newer results under novel parameter settings.

Availability and implementation: https://github.com/ncbi/JUDI.

Supplementary information: Supplementary data are available at Bioinformatics online.

Published by Oxford University Press 2019. This work is written by US Government employees and is in the public domain in the US.

Publication types

Research Support, N.I.H., Intramural

MeSH terms

Algorithms
Computational Biology*
Language
Software*
Workflow