This chapter addresses the following fundamental question: Do sequences of protein domains with sandwich architecture have common sequence characteristics even though they belong to different superfamilies and folds? The analysis was carried out in two stages: (1) determination of domain substructures shared by all sandwich proteins and (2) detection of common sequence characteristics within the substructures. Analysis of supersecondary structures in domains of proteins revealed two types of four-strand substructures that are common to sandwich proteins. At least one of these common substructures was found in proteins of 42 sandwich-like folds (per structural classification in the CATH database). A comparison of sequence fragments and residue-residue contacts constituting common substructures revealed specific distributions of hydrophobic residues in these chains. The shared sequences and structural characteristics can be conceptualized as the "grammatical rules of beta protein linguistics." Understanding the structural and sequence commonalities of sandwich proteins may prove useful for rational protein design.
Keywords: Beta protein linguistics; Beta-sandwich architecture; Contacts between beta-strands; Sandwich substructures; Sequence comparison; Structural bioinformatics; Supersecondary structure.
© 2025. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.