Development of quantitative structure property relationship models and tool for predicting the soil adsorption coefficient (logKOC)

Environ Pollut. 2025 Jan 16:368:125703. doi: 10.1016/j.envpol.2025.125703. Online ahead of print.

Abstract

The soils/sediments organic carbon sorption coefficient (KOC) of organic substances is one of the indispensable environmental behavioral parameters in chemicals management. Because the test procedure used to measure KOC is normally expensive and time-consuming, predictive methods are considered vitally important technology to fill the data gap of KOC. In this study, quantitative structure-property relationship (QSPR) models are developed using a data set with 1477 experimental logKOC values and seven typical machine learning algorithms. We obtained three types of optimum models, i.e. one logarithm of n-octanol/water partition coefficient (logKOW)-based univariate model, four logKOW-based and three non logKOW-based multi-variables machine learning models. The assessment results related to internal (goodness-of-fit and robustness) and external predictive ability indicate that all the optimum models exhibit good goodness-of-fit, robustness and predictive ability because the statistical parameters of all of those models met the accept criteria for goodness-of-fit (R2Train > 0.700), robustness (Q2LOO, Q2LMO & Q2BOOT > 0.600) and predictive ability (Q2EXT > 0.700, CCC > 0.850, r2m > 0.500, Δr2m < 0.200). For convenient use, a software tool named "logKOC Predictor" was developed employing the aforementioned three types of optimum models. An external data set with 70 experimental logKOC values was then used to test the predictive ability of the "logKOC Predictor". Results show that the tool may reliably estimate an unknown logKOC value of a given target substance if the substance is in the applicability domain of developed models and its predicted data is marked as "High reliability".

Keywords: Machine learning; Organic substance; Predictive tool; Reliability evaluation; Soils/sediments organic carbon sorption coefficient; n-Octanol/water partition coefficient.