Analysis, clustering and prediction of the conformation of short and medium size loops connecting regular secondary structures

Pac Symp Biocomput. 1996:570-89.

Abstract

Loops are regions of non-repetitive conformation connecting regular secondary structures. They are both the most difficult and error prone regions of a protein to solve by X-ray crystallography and the hardest regions to model using knowledge-based procedures. While the core of a protein can be straight forwardly modelled from the structurally conserved regions of homologues of known structure, loops must be modelled from a selected homologue or from a loop chosen from outside the family. Here we present a loop prediction procedure that attempts to identify the conformational class of the loop rather than to select a specific loop from a database of fragments. The structures of some 2083 loops of one to eight residues in length were extracted from a database of 225 protein and protein domain structures. For each loop, the relative disposition of its bounding secondary structures is described by the separation between the tips of their axes, the angle and dihedral angle between their axes. From the clustering of the loops according to the root mean square deviation of their spatial fit, a total of 162 loop conformational classes, including 79% of loops, were identified. One-hundred and eight of these, involving 66% of the loops, were populated by at least four non-homologous loops or four loops sharing a low sequence identity. Another 54 classes, including 13% of the loops, were populated by at least three loops of low sequence similarity from three or fewer non-homologous groups. Most of the previously described loop conformations were found among the populated classes. For each class a template was constructed containing both sequence preferences and the relative disposition of bounding secondary structures among member loops. During comparative modelling, the conformation of a loop can be predicted by identifying a loop class with which its sequence and disposition of bounding secondary structures are compatible.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Computer Simulation*
  • Conserved Sequence
  • Crystallography, X-Ray
  • Databases, Factual*
  • Models, Molecular*
  • Protein Conformation*
  • Protein Structure, Secondary*
  • Proteins / chemistry*
  • Sequence Alignment
  • Software

Substances

  • Proteins