Machine learning methods to predict cadmium (Cd) concentration in rice grain and support soil management at a regional scale

Fundam Res. 2023 Mar 10;4(5):1196-1205. doi: 10.1016/j.fmre.2023.02.016. eCollection 2024 Sep.

Abstract

Rice is a major dietary source of the toxic metal cadmium (Cd). Concentration of Cd in rice grain varies widely at the regional scale, and it is challenging to predict grain Cd concentration using soil properties. The lack of reliable predictive models hampers management of contaminated soils. Here, we conducted a three-year survey of 601 pairs of soil and rice samples at a regional scale. Approximately 78.3% of the soil samples exceeded the soil screening values for Cd in China, and 53.9% of rice grain samples exceeded the Chinese maximum permissible limit for Cd. Predictive models were developed using multiple linear regression and machine learning methods. The correlations between rice grain Cd and soil total Cd concentrations were poor (R 2 < 0.17). Both linear regression and machine learning methods identified four key factors that significantly affect grain Cd concentrations, including Fe-Mn oxide bound Cd, soil pH, field soil moisture content, and the concentration of soil reducible Mn. The machine learning-based support vector machine model showed the best performance (R 2 = 0.87) in predicting grain Cd concentrations at a regional scale, followed by machine learning-based random forest model (R 2 = 0.67), and back propagation neural network model (R 2 = 0.64). Scenario simulations revealed that liming soil to a target pH of 6.5 could be one of the most cost-effective approaches to reduce the exceedance of Cd in rice grain. Taken together, these results show that machine learning methods can be used to predict Cd concentration in rice grain reliably at a regional scale and to support soil management and safe rice production.

Keywords: Cadmium; Food safety; Heave metals; Machine learning; Predictive model; Soil contamination.