Assessing Exhaustiveness of Stochastic Sampling for Integrative Modeling of Macromolecular Structures

Shruthi Viswanath; Ilan E Chemmama; Peter Cimermancic; Andrej Sali

doi:10.1016/j.bpj.2017.10.005

Assessing Exhaustiveness of Stochastic Sampling for Integrative Modeling of Macromolecular Structures

Biophys J. 2017 Dec 5;113(11):2344-2353. doi: 10.1016/j.bpj.2017.10.005.

Authors

Shruthi Viswanath¹, Ilan E Chemmama², Peter Cimermancic³, Andrej Sali⁴

Affiliations

¹ Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California. Electronic address: shruthi@salilab.org.
² Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California; Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California.
³ Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California.
⁴ Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California; Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California; Institute of Quantitative Biosciences, University of California San Francisco, San Francisco, California. Electronic address: sali@salilab.org.

Abstract

Modeling of macromolecular structures involves structural sampling guided by a scoring function, resulting in an ensemble of good-scoring models. By necessity, the sampling is often stochastic, and must be exhaustive at a precision sufficient for accurate modeling and assessment of model uncertainty. Therefore, the very first step in analyzing the ensemble is an estimation of the highest precision at which the sampling is exhaustive. Here, we present an objective and automated method for this task. As a proxy for sampling exhaustiveness, we evaluate whether two independently and stochastically generated sets of models are sufficiently similar. The protocol includes testing 1) convergence of the model score, 2) whether model scores for the two samples were drawn from the same parent distribution, 3) whether each structural cluster includes models from each sample proportionally to its size, and 4) whether there is sufficient structural similarity between the two model samples in each cluster. The evaluation also provides the sampling precision, defined as the smallest clustering threshold that satisfies the third, most stringent test. We validate the protocol with the aid of enumerated good-scoring models for five illustrative cases of binary protein complexes. Passing the proposed four tests is necessary, but not sufficient for thorough sampling. The protocol is general in nature and can be applied to the stochastic sampling of any set of models, not just structural models. In addition, the tests can be used to stop stochastic sampling as soon as exhaustiveness at desired precision is reached, thereby improving sampling efficiency; they may also help in selecting a model representation that is sufficiently detailed to be informative, yet also sufficiently coarse for sampling to be exhaustive.

Assessing Exhaustiveness of Stochastic Sampling for Integrative Modeling of Macromolecular Structures

Authors

Affiliations

Abstract

MeSH terms

Substances

Grants and funding