Identification of soil parent materials in naturally high background areas based on machine learning

Sci Total Environ. 2023 Jun 1:875:162684. doi: 10.1016/j.scitotenv.2023.162684. Epub 2023 Mar 8.

Abstract

Recently, farmlands with high geological background of Cd derived from carbonate rock (CA) and black shale areas (BA) have received wide attention. However, although both CA and BA belong to high geological background areas, the mobility of soil Cd differs significantly between them. In addition to the difficulty in reaching the parent material in deep soil, it is challenging to perform land use planning in high geological background areas. This study attempts to determine the key soil geochemical parameters related to the spatial patterns of lithology and the main factors influencing the geochemical behavior of soil Cd, and ultimately uses them and machine-learning methods to identify CA and BA. In total, 10,814 and 4323 surface soil samples were collected from CA and BA, respectively. Hot spot analysis revealed that soil properties and soil Cd were significantly correlated with the underlying bedrock, except for TOC and S. Further research confirmed that the concentration and mobility of Cd in high geological background areas were mainly affected by pH and Mn. The soil parent materials were then predicted using artificial neural network (ANN), random forest (RF) and support vector machine (SVM) models. The ANN and RF models showed higher Kappa coefficients and overall accuracies than those of the SVM model, suggesting that ANNs and RF have the potential to predict soil parent materials from soil data, which might help in ensuring safe land use and coordinating activities in high geological background areas.

Keywords: High geological background; Hot spot analysis; Machine learning; Parent material; Soil cadmium.