Protein structure prediction is fundamental to molecular biology and has numerous applications in areas such as drug discovery and protein engineering. Machine learning techniques have greatly advanced protein 3D modeling in recent years, particularly with the development of AlphaFold2 (AF2), which can analyze sequences of amino acids and predict 3D structures with near experimental accuracy. Since the release of AF2, numerous studies have been conducted, either using AF2 directly for large-scale modeling or building upon the software for other use cases. Many reviews have been published discussing the impact of AF2 in the field of protein bioinformatics, particularly in relation to neural networks, which have highlighted what AF2 can and cannot do. It is evident that AF2 and similar approaches are open to further development and several new approaches have emerged, in addition to older refinement approaches, for improving the quality of predictions. Here we provide a brief overview, aimed at the general biologist, of how machine learning techniques have been used for improvement of 3D models of proteins following AF2, and we highlight the impacts of these approaches. In the most recent experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP15), the most successful groups all developed their own tools for protein structure modeling that were based at least in some part on AF2. This improvement involved employing techniques such as generative modeling, changing parameters such as dropout to generate more AF2 structures, and data-driven approaches including using alternative templates and MSAs.
Keywords: AlphaFold; CASP; Data-driven approaches; Machine learning; Protein refinement.
© 2025. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.