Unsupervised evolution of protein and antibody complexes with a structure-informed language model

Science. 2024 Jul 5;385(6704):46-53. doi: 10.1126/science.adk8946. Epub 2024 Jul 4.

Abstract

Large language models trained on sequence information alone can learn high-level principles of protein design. However, beyond sequence, the three-dimensional structures of proteins determine their specific function, activity, and evolvability. Here, we show that a general protein language model augmented with protein structure backbone coordinates can guide evolution for diverse proteins without the need to model individual functional tasks. We also demonstrate that ESM-IF1, which was only trained on single-chain structures, can be extended to engineer protein complexes. Using this approach, we screened about 30 variants of two therapeutic clinical antibodies used to treat severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. We achieved up to 25-fold improvement in neutralization and 37-fold improvement in affinity against antibody-escaped viral variants of concern BQ.1.1 and XBB.1.5, respectively. These findings highlight the advantage of integrating structural information to identify efficient protein evolution trajectories without requiring any task-specific training data.

MeSH terms

  • Antibodies, Neutralizing* / chemistry
  • Antibodies, Neutralizing* / genetics
  • Antibodies, Neutralizing* / immunology
  • Antibodies, Viral* / chemistry
  • Antibodies, Viral* / genetics
  • Antibodies, Viral* / immunology
  • Antibody Affinity
  • Antigen-Antibody Complex / chemistry
  • COVID-19 / immunology
  • COVID-19 / virology
  • Directed Molecular Evolution* / methods
  • Humans
  • Models, Molecular
  • Protein Conformation
  • Protein Engineering
  • SARS-CoV-2 / genetics
  • SARS-CoV-2 / immunology

Substances

  • Antibodies, Neutralizing
  • Antibodies, Viral
  • Antigen-Antibody Complex

Supplementary concepts

  • SARS-CoV-2 variants