A structural homology approach for computational protein design with flexible backbone

David Simoncini; Kam Y J Zhang; Thomas Schiex; Sophie Barbe

doi:10.1093/bioinformatics/bty975

A structural homology approach for computational protein design with flexible backbone

Bioinformatics. 2019 Jul 15;35(14):2418-2426. doi: 10.1093/bioinformatics/bty975.

Authors

David Simoncini^{1

2}, Kam Y J Zhang³, Thomas Schiex⁴, Sophie Barbe¹

Affiliations

¹ Laboratoire d'Ingénierie des Systèmes Biologiques et des Procédés, LISBP, Université de Toulouse, CNRS, INRA, INSA, F Toulouse cedex 04, France.
² Institut de recherche en informatique de Toulouse, IRIT, UMR 5505-CNRS, Université de Toulouse, Cedex 9, France.
³ Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, Yokohama, Kanagawa, Japan.
⁴ Institut de recherche en informatique de Toulouse, UMR 5505-CNRS, Université de Toulouse, Cedex 9, France.

PMID: 30496341
DOI: 10.1093/bioinformatics/bty975

Abstract

Motivation: Structure-based Computational Protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. Energy functions remain however imperfect and injecting relevant information from known structures in the design process should lead to improved designs.

Results: We introduce Shades, a data-driven CPD method that exploits local structural environments in known protein structures together with energy to guide sequence design, while sampling side-chain and backbone conformations to accommodate mutations. Shades (Structural Homology Algorithm for protein DESign), is based on customized libraries of non-contiguous in-contact amino acid residue motifs. We have tested Shades on a public benchmark of 40 proteins selected from different protein families. When excluding homologous proteins, Shades achieved a protein sequence recovery of 30% and a protein sequence similarity of 46% on average, compared with the PFAM protein family of the target protein. When homologous structures were added, the wild-type sequence recovery rate achieved 93%.

Availability and implementation: Shades source code is available at https://bitbucket.org/satsumaimo/shades as a patch for Rosetta 3.8 with a curated protein structure database and ITEM library creation software.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Amino Acid Sequence
Computational Biology
Databases, Protein
Protein Conformation
Proteins
Software*

Substances

Proteins