Machine learning predictive insight of water pollution and groundwater quality in the Eastern Province of Saudi Arabia

Sci Rep. 2024 Aug 28;14(1):20031. doi: 10.1038/s41598-024-70610-4.

Abstract

This study presents an innovative approach for predicting water and groundwater quality indices (WQI and GWQI) in the Eastern Province of Saudi Arabia, addressing critical challenges of scarcity and pollution in arid regions. Recent literature highlights the increasing attention towards WQI based on water pollution index (WPI) and GWQI as essential tools for simplifying complex hydrogeological data, thereby facilitating effective groundwater management and protection. Unlike previous works, the present research introduces a novel hybrid method that integrates non-parametric kernel Gaussian learning (GPR), adaptive neuro-fuzzy inference system (ANFIS), and decision tree (DT) algorithms. This approach marks the first application of a non-parametric kernel for groundwater quality pollution index prediction in Saudi Arabia, offering a significant advancement in the field. Through laboratory analysis and the combination of various machine learning (ML) techniques, this study enhances prediction capabilities, particularly for unmonitored sites in arid and semi-arid regions. The study's objectives include feature engineering based on dependency sensitivity analysis to identify the most influential variables affecting WQI and GWQI, and the development of predictive models using ANFIS, GPR, and DT for both indices. Furthermore, it aims to assess the impact of different data portions on WQI and GWQI predictions, exploring data divisions such as (70% / 30%), (60% / 40%), and (80% / 20%) for training and testing phase, respectively. By filling a critical gap in water resource management, this research offers significant implications for the prediction of water quality in regions facing similar environmental challenges. Through its innovative methodology and comprehensive analysis, this study contributes to the broader effort of managing and protecting water resources in arid and semi-arid areas. The result proved that GPR-M1 exhibited exceptional testing phase accuracy with RMSE = 0.0169 for GWQI. Similarly, for WPI, the ANFIS-M1 achieved high testing predictive skills with RMSE = 0.0401. The results emphasize the critical role of data quality and quantity in training for enhancing model robustness and prediction precision in water quality assessment.

Keywords: Eastern Province; Environmental monitoring; Groundwater quality; Machine learning; Saudi Arabia; Water pollution.