Fine-tuning molecular mechanics force fields to experimental free energy measurements

bioRxiv [Preprint]. 2025 Jan 8:2025.01.06.631610. doi: 10.1101/2025.01.06.631610.

Abstract

Alchemical free energy methods using molecular mechanics (MM) force fields are essential tools for predicting thermodynamic properties of small molecules, especially via free energy calculations that can estimate quantities relevant for drug discovery such as affinities, selectivities, the impact of target mutations, and ADMET properties. While traditional MM forcefields rely on hand-crafted, discrete atom types and parameters, modern approaches based on graph neural networks (GNNs) learn continuous embedding vectors that represent chemical environments from which MM parameters can be generated. Excitingly, GNN parameterization approaches provide a fully end-to-end differentiable model that offers the possibility of systematically improving these models using experimental data. In this study, we treat a pretrained GNN force field-here, espaloma-0.3.2-as a foundation simulation model and fine-tune its charge model using limited quantities of experimental hydration free energy data, with the goal of assessing the degree to which this can systematically improve the prediction of other related free energies. We demonstrate that a highly efficient "one-shot fine-tuning" method using an exponential (Zwanzig) reweighting free energy estimator can improve prediction accuracy without the need to resimulate molecular configurations. To achieve this "one-shot" improvement, we demonstrate the importance of using effective sample size (ESS) regularization strategies to retain good overlap between initial and fine-tuned force fields. Moreover, we show that leveraging low-rank projections of embedding vectors can achieve comparable accuracy improvements as higher-dimensional approaches in a variety of data-size regimes. Our results demonstrate that linearly-perturbative fine-tuning of foundation model electrostatic parameters to limited experimental data offers a cost-effective strategy that achieves state-of-the-art performance in predicting hydration free energies on the FreeSolv dataset.

Publication types

  • Preprint