TY - JOUR
T1 - A new Bayesian approach for managing bathing water quality at river bathing locations vulnerable to short-term pollution
AU - Seis, Wolfgang
AU - Veldhuis, Marie Claire Ten
AU - Rouault, Pascale
AU - Steffelbauer, David
AU - Medema, Gertjan
PY - 2024
Y1 - 2024
N2 - Short-term fecal pollution events are a major challenge for managing microbial safety at recreational waters. Long turn-over times of current laboratory methods for analyzing fecal indicator bacteria (FIB) delay water quality assessments. Data-driven models have been shown to be valuable approaches to enable fast water quality assessments. However, a major barrier towards the wider use of such models is the prevalent data scarcity at existing bathing waters, which questions the representativeness and thus usefulness of such datasets for model training. The present study explores the ability of five data-driven modelling approaches to predict short-term fecal pollution episodes at recreational bathing locations under data scarce situations and imbalanced datasets. The study explicitly focuses on the potential benefits of adopting an innovative modeling and risk-based assessment approach, based on state/cluster-based Bayesian updating of FIB distributions in relation to different hydrological states. The models are benchmarked against commonly applied supervised learning approaches, particularly linear regression, and random forests, as well as to a zero-model which closely resembles the current way of classifying bathing water quality in the European Union. For model-based clustering we apply a non-parametric Bayesian approach based on a Dirichlet Process Mixture Model. The study tests and demonstrates the proposed approaches at three river bathing locations in Germany, known to be influenced by short-term pollution events. At each river two modelling experiments (“longest dry period”, “sequential model training”) are performed to explore how the different modelling approaches react and adapt to scarce and uninformative training data, i.e., datasets that do not include event pollution information in terms of elevated FIB concentrations. We demonstrate that it is especially the proposed Bayesian approaches that are able to raise correct warnings in such situations (> 90 % true positive rate). The zero-model and random forest are shown to be unable to predict contamination episodes if pollution episodes are not present in the training data. Our research shows that the investigated Bayesian approaches reduce the risk of missed pollution events, thereby improving bathing water safety management. Additionally, the approaches provide a transparent solution for setting minimum data quality requirements under various conditions. The proposed approaches open the way for developing data-driven models for bathing water quality prediction against the reality that data scarcity is common problem at existing and prospective bathing waters.
AB - Short-term fecal pollution events are a major challenge for managing microbial safety at recreational waters. Long turn-over times of current laboratory methods for analyzing fecal indicator bacteria (FIB) delay water quality assessments. Data-driven models have been shown to be valuable approaches to enable fast water quality assessments. However, a major barrier towards the wider use of such models is the prevalent data scarcity at existing bathing waters, which questions the representativeness and thus usefulness of such datasets for model training. The present study explores the ability of five data-driven modelling approaches to predict short-term fecal pollution episodes at recreational bathing locations under data scarce situations and imbalanced datasets. The study explicitly focuses on the potential benefits of adopting an innovative modeling and risk-based assessment approach, based on state/cluster-based Bayesian updating of FIB distributions in relation to different hydrological states. The models are benchmarked against commonly applied supervised learning approaches, particularly linear regression, and random forests, as well as to a zero-model which closely resembles the current way of classifying bathing water quality in the European Union. For model-based clustering we apply a non-parametric Bayesian approach based on a Dirichlet Process Mixture Model. The study tests and demonstrates the proposed approaches at three river bathing locations in Germany, known to be influenced by short-term pollution events. At each river two modelling experiments (“longest dry period”, “sequential model training”) are performed to explore how the different modelling approaches react and adapt to scarce and uninformative training data, i.e., datasets that do not include event pollution information in terms of elevated FIB concentrations. We demonstrate that it is especially the proposed Bayesian approaches that are able to raise correct warnings in such situations (> 90 % true positive rate). The zero-model and random forest are shown to be unable to predict contamination episodes if pollution episodes are not present in the training data. Our research shows that the investigated Bayesian approaches reduce the risk of missed pollution events, thereby improving bathing water safety management. Additionally, the approaches provide a transparent solution for setting minimum data quality requirements under various conditions. The proposed approaches open the way for developing data-driven models for bathing water quality prediction against the reality that data scarcity is common problem at existing and prospective bathing waters.
KW - Dirichlet Process Mixture Model
KW - Probabilistic modelling
KW - Recreational waters
UR - http://www.scopus.com/inward/record.url?scp=85184566397&partnerID=8YFLogxK
U2 - 10.1016/j.watres.2024.121186
DO - 10.1016/j.watres.2024.121186
M3 - Article
AN - SCOPUS:85184566397
SN - 0043-1354
VL - 252
JO - Water Research
JF - Water Research
M1 - 121186
ER -