Protein loops often play important roles in biological functions. Modeling loops accurately is crucial to determining the functional specificity of a protein. Despite the recent progress in loop prediction approaches, which led to a number of algorithms over the past decade, few rigorous algorithmic approaches exist to model protein loops using global orientational restraints, such as those obtained from residual dipolar coupling (RDC) data in solution nuclear magnetic resonance (NMR) spectroscopy. In this article, we present a novel, sparse data, RDC-based algorithm, which exploits the mathematical interplay between RDC-derived sphero-conics and protein kinematics, and formulates the loop structure determination problem as a system of low-degree polynomial equations that can be solved exactly, in closed-form. The polynomial roots, which encode the candidate conformations, are searched systematically, using provable pruning strategies that triage the vast majority of conformations, to enumerate or prune all possible loop conformations consistent with the data; therefore, completeness is ensured. Results on experimental RDC datasets for four proteins, including human ubiquitin, FF2, DinI, and GB3, demonstrate that our algorithm can compute loops with higher accuracy, a three- to six-fold improvement in backbone RMSD, versus those obtained by traditional structure determination protocols on the same data. Excellent results were also obtained on synthetic RDC datasets for protein loops of length 4, 8, and 12 used in previous studies. These results suggest that our algorithm can be successfully applied to determine protein loop conformations, and hence, will be useful in high-resolution protein backbone structure determination, including loops, from sparse NMR data. Proteins 2012. © 2011 Wiley Periodicals, Inc.
Keywords: algorithms; inverse kinematics; loop closure; nuclear magnetic resonance; protein loops; residual dipolar couplings; sphero-conic; structural biology.
Copyright © 2011 Wiley Periodicals, Inc.