cOSPREY: A Cloud-Based Distributed Algorithm for Large-Scale Computational Protein Design

J Comput Biol. 2016 Sep;23(9):737-49. doi: 10.1089/cmb.2015.0234. Epub 2016 May 6.

Abstract

Finding the global minimum energy conformation (GMEC) of a huge combinatorial search space is the key challenge in computational protein design (CPD) problems. Traditional algorithms lack a scalable and efficient distributed design scheme, preventing researchers from taking full advantage of current cloud infrastructures. We design cloud OSPREY (cOSPREY), an extension to a widely used protein design software OSPREY, to allow the original design framework to scale to the commercial cloud infrastructures. We propose several novel designs to integrate both algorithm and system optimizations, such as GMEC-specific pruning, state search partitioning, asynchronous algorithm state sharing, and fault tolerance. We evaluate cOSPREY on three different cloud platforms using different technologies and show that it can solve a number of large-scale protein design problems that have not been possible with previous approaches.

Keywords: MapReduce; branch and bound; cloud; distributed systems; global minimum energy conformation; protein design.

MeSH terms

  • Algorithms*
  • Computational Biology / methods*
  • HIV Envelope Protein gp120 / chemistry
  • Humans
  • Models, Molecular
  • Protein Conformation*
  • Software*

Substances

  • HIV Envelope Protein gp120
  • gp120 protein, Human immunodeficiency virus 1