Quantifying the contributions of factors to bioaccessible Cd and Pb in soil using machine learning

J Hazard Mater. 2025 Jan 2:487:137102. doi: 10.1016/j.jhazmat.2025.137102. Online ahead of print.

Abstract

The bioaccessibility of cadmium (Cd) and lead (Pb) in the gastrointestinal tract is crucial for health risk assessments of contaminated soils. However, variability in In vitro analytical conditions and soil properties introduces bias and uncertainty in predictions. This study employed three in vitro methods to measure Cd and Pb bioaccessibility during the gastric and gastrointestinal phases, using soil samples incubated for one year. Twelve machine learning models were tested, with Random Forest chosen for its superior performance, achieving R² values between 0.74 and 0.82 in the test set. Key experimental conditions, including Cl⁻ concentration and extraction pH, were identified among the top five factors influencing bioaccessibility. Despite identical incubation conditions, bioaccessible Cd and Pb varied significantly, sometimes by several orders of magnitude, across soil types. Soil properties such as fine particle percentage (<1 μm) and pH were crucial, while MnO₂ content had a greater effect on Pb due to its geochemical behavior. Incorporating aging time into the model improved predictions, explaining 3.6-7.5 % of the variation, with the potential for a greater influence over longer contact times. This study emphasizes the importance of experimental conditions and soil-specific factors in accurately predicting heavy metal bioaccessibility in contaminated soils.

Keywords: Aging time; In vitro simulation; Machine Learning; Random Forest; Soil properties.