The successful use of visible and near-infrared (Vis-NIR) reflectance spectroscopy analysis requires selecting an optimal procedure of data acquisition and an accurate modeling approach. In this study, Vis-NIR with 350-2500 nm wavelengths were applied to detect different forms of lead (Pb) through the spectrally active soil constituents combining principal component regression (PCR) and Partial least-square regression (PLSR) for the Vis-NIR model calibration. Three clouds with different soil spectral properties were divided by the Linear discriminant analysis (LDA) in categories of Pb contamination risks: "low," "health," "ecological," ranging from 200 to 750 mg kg-1. Farm soils were used for calibration (n = 26), and more polluted garden soils (n = 36) from New York City were used for validation. Total and bioaccessible Pb concentrations were examined with PLSR models and compared with Support Vector Machine (SVM) Regression and Boosting Regression Tree (BRT) models. Performances of all models' predictions were qualitatively evaluated by the Root Mean Square Error (RMSE), Residual Prediction Deviation (RPD), and coefficient of determination (R2). For total Pb, the best predictive models were obtained with BRT (R2 = 0.82 and RMSE 341.80 mg kg-1) followed by SVM (validation, R2 = 0.77 and RMSE 337.96 mg kg-1), and lastly by PLSR (validation, R2 = 0.74 and RMSE 499.04 mg kg-1). The PLSR technique is the most accurate calibration model for bioaccessible Pb with an R2 value of 0.91 and RMSE of 68.27 mg kg-1. The regression analysis indicated that bioaccessible Pb is strongly influenced by organic content, and to a lesser extent, by Fe concentrations. Although PLSR obtained lower accuracy, the model selected many characteristic bands and, thus, provided accurate approach for Pb pollution monitoring.
Keywords: Boosting Regression Tree; Chemometric; Lead; Partial least-squares regression; Support vector machine; Urban soil; Vis-NIR spectroscopy.
Copyright © 2021 Elsevier B.V. All rights reserved.