Quantifying reproducibility in computational biology: the case of the tuberculosis drugome

PLoS One. 2013 Nov 27;8(11):e80278. doi: 10.1371/journal.pone.0080278. eCollection 2013.

Abstract

How easy is it to reproduce the results found in a typical computational biology paper? Either through experience or intuition the reader will already know that the answer is with difficulty or not at all. In this paper we attempt to quantify this difficulty by reproducing a previously published paper for different classes of users (ranging from users with little expertise to domain experts) and suggest ways in which the situation might be improved. Quantification is achieved by estimating the time required to reproduce each of the steps in the method described in the original paper and make them part of an explicit workflow that reproduces the original results. Reproducing the method took several months of effort, and required using new versions and new software that posed challenges to reconstructing and validating the results. The quantification leads to "reproducibility maps" that reveal that novice researchers would only be able to reproduce a few of the steps in the method, and that only expert researchers with advance knowledge of the domain would be able to reproduce the method in its entirety. The workflow itself is published as an online resource together with supporting software and data. The paper concludes with a brief discussion of the complexities of requiring reproducibility in terms of cost versus benefit, and a desiderata with our observations and guidelines for improving reproducibility. This has implications not only in reproducing the work of others from published papers, but reproducing work from one's own laboratory.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Computational Biology / methods*
  • Computational Biology / standards
  • Humans
  • Internet
  • Reproducibility of Results
  • Software

Grants and funding

This research is sponsored by Elsevier Labs, the National Science Foundation with award number IIS-0948429, the Air Force Office of Scientific Research with award number FA9550-11-1-0104, internal funds from the University of Southern California's Information Sciences Institute and from the University of California, San Diego, and by a Formación de Profesorado Universitario grant from the Spanish Ministry of Science and Innovation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.