Keeping it in the family: using protein family templates to rescue low confidence AlphaFold2 models

Francesco Costa; Matthias Blum; Alex Bateman

doi:10.1093/bioadv/vbae188

Keeping it in the family: using protein family templates to rescue low confidence AlphaFold2 models

Bioinform Adv. 2024 Nov 25;4(1):vbae188. doi: 10.1093/bioadv/vbae188. eCollection 2024.

Authors

Francesco Costa¹, Matthias Blum¹, Alex Bateman¹

Affiliation

¹ European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, United Kingdom.

Abstract

Motivation: High confidence structure prediction models have become available for nearly all protein sequences. More than 200 million AlphaFold2 models are now publicly available. We observe that there can be significant variability in the prediction confidence as judged by plDDT scores across a protein family. We have explored whether the predictions with lower plDDT in a family can be improved by the use of higher plDDT templates from the family as template structures in AlphaFold2.

Results: Our work shows that about one-third of the time structures with a low plDDT can be "rescued," moved from low to reasonable confidence. We also find that surprisingly in many cases we get a higher plDDT model when we switch off the multiple sequence alignment (MSA) option in AlphaFold2 and solely rely on a high-quality template. However, we find the best overall strategy is to make predictions both with and without the MSA information and select the model with the highest average plDDT. We also find that using high plDDT models as templates can increase the speed of AlphaFold2 as implemented in ColabFold. Additionally, we try to demonstrate that as well as having increased overall plDDT, the models are likely to have higher quality structures as judged by two metrics.

Availability and implementation: We have implemented our pipeline in NextFlow and it is available in GitHub: https://github.com/FranceCosta/AF2Fix.