Protein superfolds are characterised as frustration-free topologies: A case study of pure parallel β-sheet topologies

PLoS Comput Biol. 2024 Aug 7;20(8):e1012282. doi: 10.1371/journal.pcbi.1012282. eCollection 2024 Aug.

Abstract

A protein superfold is a type of protein fold that is observed in at least three distinct, non-homologous protein families. Structural classification studies have revealed a limited number of prevalent superfolds alongside several infrequent occurring folds, and in α/β type superfolds, the C-terminal β-strand tends to favor the edge of the β-sheet, while the N-terminal β-strand is often found in the middle. The reasons behind these observations, whether they are due to evolutionary sampling bias or physical interactions, remain unclear. This article offers a physics-based explanation for these observations, specifically for pure parallel β-sheet topologies. Our investigation is grounded in several established structural rules that are based on physical interactions. We have identified "frustration-free topologies" which are topologies that can satisfy all the rules simultaneously. In contrast, topologies that cannot are termed "frustrated topologies." Our findings reveal that frustration-free topologies represent only a fraction of all theoretically possible patterns, these topologies strongly favor positioning the C-terminal β-strand at the edge of the β-sheet and the N-terminal β-strand in the middle, and there is significant overlap between frustration-free topologies and superfolds. We also used a lattice protein model to thoroughly investigate sequence-structure relationships. Our results show that frustration-free structures are highly designable, while frustrated structures are poorly designable. These findings suggest that superfolds are highly designable due to their lack of frustration, and the preference for positioning C-terminal β-strands at the edge of the β-sheet is a direct result of frustration-free topologies. These insights not only enhance our understanding of sequence-structure relationships but also have significant implications for de novo protein design.

MeSH terms

  • Computational Biology / methods
  • Models, Molecular*
  • Protein Conformation, beta-Strand
  • Protein Folding*
  • Proteins* / chemistry

Substances

  • Proteins

Grants and funding

This research was funded by the KAKENHI Grant 22H00406 of Japan Society for the Promotion of Science for G.C and by JST SPRING, Grant Number JPMJSP2125 for H.M. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.