Faecal contamination is a widespread environmental and public health problem on recreational beaches around the world. The implementation of predictive models has been recommended by the World Health Organization as a complement to traditional monitoring to assist decision-makers and reduce health risks. Despite several advances that have been made in the modeling of faecal coliforms, tools and algorithms from machine learning are still scarcely used in the field and their implementation in nowcast systems is delayed. Here, we perform a literature review on modeling strategies to predict faecal contamination in recreational beaches in the last two decades and the implementation of models in nowcast systems to aid management. Models constructed for surface waters of continental (lakes, rivers and streams), estuarine and marine coastal ecosystems were analyzed and compared based on performance metrics for continuous (i.e. regression; R2, Root Mean Square Error: RMSE) and categorical (i.e. classification; accuracy, sensitivity, specificity) responses. We found 67 articles matching the search criteria and 40 with information allowing to evaluate and compare predictive ability. In early 2000, Multiple Linear Regressions were common, followed by a peak of Artificial Neural Networks (ANNs) from 2010 to 2015, and the rise of Machine learning techniques, such as decision trees (CART and Random Forest) since 2015. ANNs and decision trees presented better accuracy than the remaining models. Rainfall and its lags were important predictor variables followed by water temperature. Specificity was much higher than sensitivity in all modeling strategies, which is typical in data sets where one category (e.g. closed beach) is far less common than the normal state (i.e. unbalanced data sets). We registered the implementation of statistical models in early warning systems in 6 countries, mainly by public beach quality management institutions, followed by NGOs in conjunction with universities. We identified critical steps towards improving model construction, evaluation and usage: i) the need to balance the data set previous to model training, ii) the need to separate data set in training, validation and test to perform an honest evaluation of model performance and iii) the transduction of model outputs to plain language to relevant stakeholders. Integrating into a single framework in situ monitoring, model construction and nowcasting systems could help to improve decision making systems to protect users from bathing in contaminated waters. Still the reduction of arrival of faecal coliforms to aquatic ecosystems (e.g. by improving sewage treatment systems) will be the ultimate factor in reducing health risk.
Keywords: Faecal indicator bacteria; Machine learning techniques; Nowcast systems; Recreational beaches; Statistical modeling.
Copyright © 2024 Elsevier B.V. All rights reserved.