A data-driven framework to identify influencing factors for soil heavy metal contaminations using random forest and bivariate local Moran's I: A case study

J Environ Manage. 2025 Jan 21:375:124172. doi: 10.1016/j.jenvman.2025.124172. Online ahead of print.

Abstract

The efficacy of traceability analysis is often limited by a lack of information on influencing factors for heavy metal (HM) contaminations in soil, such as spatial correlations between HM concentrations and influencing factors. To overcome this limitation, a novel data-driven framework was established to identify influencing factors for soil HM concentrations in an industrialised study area, in Guangdong Province, China, mainly using random forest (RF) and bivariate local Moran's I (BLMI) on the basis of the 577 soil samples and the 18 environmental covariates. The quantitative contributions of the 18 influencing factors for the Cd, As, Pb, and Cr concentrations were determined by the optimised RF. The main influencing factors of Cd were petrol stations (10.97%) and railways (9.99%), the main ones of As were groundwater depth (8.45%) and elevation (8.24%), the main ones of Pb were soil pH (8.82%) and hazardous waste disposal sites (8.02%), and the main ones of Cr were mine tailings (13.65%) and rainfall (11.88%). The eight spatial clustering maps between the four HM concentrations and the two key influencing factors were generated by BLMI. The middle part of the study area has shown the higher concentrations of Cd, As, Pb, and Cr, the more complex human activities and the more high-high clusters. Priority attention should be paid to the middle part when taking the specific prevention and control measures for their contaminations. This data-driven framework provided rich information on influencing factors, including HM concentrations, HM contaminations, quantitative contributions, and qualitative spatial clusters.

Keywords: Bivariate local Moran's I; Heavy metals; Influencing factors; Random forest; Soil.