Collagens are the foundational component of diverse tissues, including skin, bone, cartilage, and basement membranes, and are the most abundant protein class in animals. The fibrillar collagens are large, complex, multidomain proteins, all containing the characteristic triple helix motif. The most prevalent collagens are heterotrimeric, meaning that cells express at least two distinctive procollagen polypeptides that must assemble into specific heterotrimer compositions. The molecular mechanisms ensuring correct heterotrimeric assemblies are poorly understood - even for the most common collagen, type-I. The longstanding paradigm is that assembly is controlled entirely by the ~30 kDa globular C-propeptide (C-Pro) domain. Still, this dominating model for procollagen assembly has left many questions unanswered. Here, we show that the C-Pro paradigm is incomplete. In addition to the critical role of the C-Pro domain in templating assembly, we find that the amino acid sequence near the C terminus of procollagen's triple-helical domain plays an essential role in defining procollagen assembly outcomes. These sequences near the C terminus of the triple-helical domain encode conformationally stabilizing features that ensure only desirable C-Pro-mediated trimeric templates are committed to irreversible triple-helix folding. Incorrect C-Pro trimer assemblies avoid commitment to triple-helix formation thanks to destabilizing features in the amino acid sequences of their triple helix. Incorrect C-Pro assemblies are consequently able to dissociate and search for new binding partners. These findings provide a distinctive perspective on the mechanism of procollagen assembly, revealing the molecular basis by which incorrect homotrimer assemblies are avoided and setting the stage for a deeper understanding of the biogenesis of this ubiquitous protein.
Keywords: endoplasmic reticulum; extracellular matrix; macromolecular complex; procollagen assembly; protein folding.