pubmed.ncbi.nlm.nih.gov

Fitting quantum machine learning potentials to experimental free energy data: predicting tautomer ratios in solution - PubMed

  • ️Fri Jan 01 2021

. 2021 Jul 19;12(34):11364-11381.

doi: 10.1039/d1sc01185e. eCollection 2021 Sep 1.

Affiliations

Fitting quantum machine learning potentials to experimental free energy data: predicting tautomer ratios in solution

Marcus Wieder et al. Chem Sci. 2021.

Abstract

The computation of tautomer ratios of druglike molecules is enormously important in computer-aided drug discovery, as over a quarter of all approved drugs can populate multiple tautomeric species in solution. Unfortunately, accurate calculations of aqueous tautomer ratios-the degree to which these species must be penalized in order to correctly account for tautomers in modeling binding for computer-aided drug discovery-is surprisingly difficult. While quantum chemical approaches to computing aqueous tautomer ratios using continuum solvent models and rigid-rotor harmonic-oscillator thermochemistry are currently state of the art, these methods are still surprisingly inaccurate despite their enormous computational expense. Here, we show that a major source of this inaccuracy lies in the breakdown of the standard approach to accounting for quantum chemical thermochemistry using rigid rotor harmonic oscillator (RRHO) approximations, which are frustrated by the complex conformational landscape introduced by the migration of double bonds, creation of stereocenters, and introduction of multiple conformations separated by low energetic barriers induced by migration of a single proton. Using quantum machine learning (QML) methods that allow us to compute potential energies with quantum chemical accuracy at a fraction of the cost, we show how rigorous relative alchemical free energy calculations can be used to compute tautomer ratios in vacuum free from the limitations introduced by RRHO approximations. Furthermore, since the parameters of QML methods are tunable, we show how we can train these models to correct limitations in the underlying learned quantum chemical potential energy surface using free energies, enabling these methods to learn to generalize tautomer free energies across a broader range of predictions.

This journal is © The Royal Society of Chemistry.

PubMed Disclaimer

Conflict of interest statement

There are no conflicts to declare.

Figures

Fig. 1
Fig. 1. Tautomeric free energy differences in solution are typically calculated using a thermodynamic cycle, independent of the level of theory that is used to obtain the individual terms. A typical quantum chemistry protocol calculates the free energy in aqueous phase as the sum of the free energy in gas phase (; calculated using the ideal gas RRHO approximation) and the standard state transfer free energy (; obtained using a continuum solvation model). The tautomeric free energy difference in solution is then calculated as the difference between for two tautomers.
Fig. 2
Fig. 2. The rigid-rotor harmonic oscillator (RRHO) approach constructs a partition function using the curvature of the potential energy landscape at a minimum modelling all bonded terms as harmonic. The enol and keto form of a molecules (in the enol form: 1-(cyclohex-3-en-1-yl)ethen-1-ol) is shown and the main conformational degrees of freedom highlighted. The middle panel shows how the use of a local and harmonic approximation to the partition function can approximate the overall potential energy landscape, if all relevant minimum conformations are enumerated. The lower panel shows the probability density resulting from the harmonic approximations, with solid colored regions the difference between the true and approximated probability density resulting from anharmonicity and/or bonded terms that would better be modeled using a hindered/free rotor.
Fig. 3
Fig. 3. Alchemical relative free energy calculations with quantum machine learning (QML) potentials like ANI can rigorously compute free energy differences. Linearly interpolating between two potential energy functions using the alchemical coupling parameter λ enables the sampling of bridging distributions between the physical endstates representing the two tautomers. Here, noninteracting dummy atoms—indicated by the empty circle at each endstate—are included to facilitate this transformation. In this work, we present the application of this concept to calculate relative free energies of tautomer pairs in vacuum.
Fig. 4
Fig. 4. State of the art quantum chemistry calculations are able to calculate tautomeric free energy differences ΔtGcalcsolv with a RMSE of 3.1 kcal mol−1. The direction of the tautomer reaction is chosen so that the experimentally obtained tautomeric free energy difference ΔtGexpsolv is always positive. Panel (A) shows ΔtGcalcsolv as the difference between the sum of the gas phase free energy and transfer free energy for each tautomer pair plotted against the experimental tautomeric free energy difference in solution ΔtGexpsolv. B3LYP/aug-cc-pVTZ is used for the gas phase geometry optimization and single point energy calculation, the ideal gas RRHO approximation is used to calculate the thermal corrections. The transfer free energy is calculated on B3LYP/aug-cc-pVTZ/SMD optimized geometries using B3LYP/6-31G(d) and SMD. Values in quadrant II (x-axis entries positive, y-axis entries negative) indicate calculations that assigned the wrong dominant tautomer species (different sign of ΔtGcalcsolv and ΔtGexpsolv). The dashed line indicates the ideal behavior of the calculated and experimental values, the grey lines mark the ±1 kcal mol−1 interval. Red dots indicate tautomer pairs with more than 10 kcal mol−1 absolute error between ΔtGcalcsolv and ΔtGexpsolv. These tautomer pairs are separately shown in Table S.I.1. In panel (B), the top panel shows the kernel density estimate (KDE) and histogram of ΔtGcalcsolv and ΔtGexpsolv. The red line indicates zero free energy difference (equipopulated free energies). In the lower panel the KDE of the difference between ΔtGexpsolv and ΔtGcalcsolv is shown. MAE and RMSE are reported in units of kcal mol−1. Quantities in brackets [X;Y] denote 95% confidence intervals. The Kullback–Leibler divergence (KL) was calculated using KL(ΔtGexpsolv‖ΔtGcalcsolv).
Fig. 5
Fig. 5. Computed free energies are highly sensitive to the selected minimum conformation. For each molecule, the number of minimum conformations Nconf is plotted against the difference between the corresponding highest and lowest obtained free energy value for the minimum conformations. 278 out of 936 molecules have only a single minimum; molecules with multiple minima show substantial free energy differences between the minimum conformations, highlighting the need for a global minimum search.
Fig. 6
Fig. 6. A single, global minimum conformation can be used to calculate tautomeric free energy differences in solution without loss of accuracy. These results are based on the calculations with B3LYP/aug-cc-pVTZ/B3LYP/6-31G(d)/SMD. (A) shows the tautomeric free energy difference ΔtGcalcsolv obtained with multiple minima (m.m.) plotted against the global, single minimum (s.m.) tautomeric free energy difference ΔtGcalcsolv. (B) shows the KDE and histogram of the difference between s.m. ΔtGcalcsolv and m.m. ΔtGcalcsolv. The comparison indicates that there is little benefit using multiple minimum structures over a single, global minimum considering the high costs of the former. Comparing two estimated properties we use root mean squared deviation (RMSD) and mean absolute deviation (MAD) instead of error.
Fig. 7
Fig. 7. Independent of the level of theory, thermochemistry corrections introduce a mean absolute deviation (MAD) of 0.9 kcal mol−1. ANI-1ccx is used for the calculation of tautomeric free energy differences in vacuum ΔtGcalcvac using alchemical relative free energy (AFE) calculations and single point energy calculations on multiple minimum conformations and thermochemistry corrections based on the RRHO approximation. (A) shows a scatter plot between the two approaches and (B) the KDE and histogram of the difference between the two approaches. Error bars are shown in red on the alchemical relative free energy estimates, these were obtained from the MBAR estimate of the relative free energy (see Methods section). Comparing two estimated properties we use root mean squared deviation (RMSD) and mean absolute deviation (MAD) instead of RMSE/MAE.
Fig. 8
Fig. 8. Optimizing QML parameters on a set of experimentally obtained tautomer free energies in solution ΔtGexpsolv enables ANI-1ccx to include crucial solvation effects and improved estimates for tautomeric free energy differences can be obtained by importance weighting from vacuum simulations using the optimized QML parameters. (A) Top panel shows the training (green) and validation (purple) set performance as ΔΔtGsolv. Validation set performance was plotted with a bootstrapped 95% confidence interval. The performance of the optimized parameter set is also shown on the original ANI-1ccx dataset in blue. The best performing parameter set (evaluated on the validation set and indicated by the red dotted line) was selected to evaluate its performance on the test set. The bottom panel shows the MAE for the energy difference between each of the 400 parameter sets and the original parameter set on all the snapshots used for the free energy calculations (≈1,2 million snapshots) split in training/validation and test set as well as the original ANI1-ccx dataset. Figure (B) shows the distribution of ΔtGexpsolv − ΔtGcalcsolv for a hold out test set (71 tautomer pairs) with the native ANI-1ccx (θ) and the optimization parameter set (θ*). The optimized parameter set was able to improve the prediction of tautomeric free energy differences from initial 6.7 kcal mol−1 to 2.8 kcal mol−1 (MAE improved from 5.3 to 2.0 kcal mol−1). The difference in Kullback–Leibler divergence (KL) indicates that the tautomeric free energy differences obtained with the optimized parameter set can reproduce the distribution of the experimental tautomer ratios much better than the free energy differences obtained with the original parameter set.
Fig. 9
Fig. 9. The tautomer dataset shows a wide variety of solvation free energies ΔGexpsolv. (A) shows the ANI-Tautobase subset that was used for the QML calculations and (B) shows the QM-Tautobase subset used for the QM calculations. The full dataset considered for this study was obtained from the DataWorrier File deposited at https://github.com/WahlOya/Tautobase (commit of Jul 23, 2019), described in detail in ref. . The selection criteria for both datasets are described in detail in Detailed methods section.

Similar articles

Cited by

References

    1. Kapetanović I., Drug Discovery and Development: Present and Future, IntechOpen, 2011
    1. Bax B. Chung C. W. Edge C. Getting the chemistry right: protonation, tautomers and the importance of H atoms in biological chemistry. Acta Crystallogr., Sect. D: Struct. Biol. 2017;73(2):131–140. doi: 10.1107/S2059798316020283. - DOI - PMC - PubMed
    1. Sitzmann M. Ihlenfeldt W. D. Nicklaus M. C. Tautomerism in large databases. J. Comput.-Aided Mol. Des. 2010;24(6–7):521–551. doi: 10.1007/s10822-010-9346-4. - DOI - PMC - PubMed
    1. Katritzky A. R. Dennis Hall C. El-Gendy B. E. D. M. Draghici B. Tautomerism in drug discovery. J. Comput.-Aided Mol. Des. 2010;24(6–7):475–484. doi: 10.1007/s10822-010-9359-z. - DOI - PubMed
    1. Martin Y. C. Let's not forget tautomers. J. Comput.-Aided Mol. Des. 2009;23(10):693–704. doi: 10.1007/s10822-009-9303-2. doi: 10.1007/s10822-009-9303-2. - DOI - DOI - PMC - PubMed