MG-RAST version 4-lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis

Folker Meyer; Saurabh Bagchi; Somali Chaterji; Wolfgang Gerlach; Ananth Grama; Travis Harrison; Tobias Paczian; William L Trimble; Andreas Wilke

doi:10.1093/bib/bbx105

MG-RAST version 4-lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis

Brief Bioinform. 2019 Jul 19;20(4):1151-1159. doi: 10.1093/bib/bbx105.

Authors

Folker Meyer, Saurabh Bagchi, Somali Chaterji, Wolfgang Gerlach, Ananth Grama, Travis Harrison, Tobias Paczian, William L Trimble, Andreas Wilke

Abstract

As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1-3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community's data analysis tasks.

Keywords: cloud; distributed workflows; metagenome analysis.

Published by Oxford University Press on behalf of Entomological Society of America 2017. This work is written by US Government employees and is in the public domain in the US.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Budgets
Computational Biology / methods
High-Throughput Nucleotide Sequencing / economics
High-Throughput Nucleotide Sequencing / methods*
High-Throughput Nucleotide Sequencing / statistics & numerical data
Internet
Metagenome*
Metagenomics / economics
Metagenomics / methods*
Metagenomics / statistics & numerical data
Sequence Analysis, DNA / economics
Sequence Analysis, DNA / methods
Sequence Analysis, DNA / statistics & numerical data
Software*
User-Computer Interface
Workflow

Abstract

Publication types

MeSH terms

Grants and funding