Evaluating the generalizability and transferability of water distribution deterioration models

Shamsuddin Daulat*, Marius Møller Rokstad, Stian Bruaset, Jeroen Langeveld, Franz Tscheikner-Gratl

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

1 Citation (Scopus)
62 Downloads (Pure)

Abstract

Small utilities often lack the required amount of data to train machine learning-based models to predict pipe failures, and hence are unable to harness the possibilities and predictive power of machine learning. This study evaluates the generalizability and transferability of a machine learning model to see if small utilities can benefit from the data and models of other utilities. Using nine Norwegian utilities’ datasets, we trained nine global models (by merging multiple datasets) and nine local models (by utilizing each utility's dataset) using random survival forest. Several pre-processing techniques including addressing left-truncated break data and break data scarcity are also presented. The global models and three of the local models were tested to predict the pipe failure of the utilities which were not included in their training datasets. The results indicate that the global models can predict other utilities with sufficient accuracy while local models have some limitations. However, if a representative utility with a sufficiently large (and information rich) dataset is selected, its model can predict the other utility's pipe breaks as accurate as the global models. Furthermore, survival curves for defined cohorts as proxies for uncertainty, and variable importance show that pipes with and without previous breaks behave extremely different. With the understanding of models’ generalizability and transferability, small utilities can benefit from the data and models of other utilities.

Original languageEnglish
Article number109611
Number of pages19
JournalReliability Engineering and System Safety
Volume241
DOIs
Publication statusPublished - 2023

Keywords

  • Data preprocessing
  • Random survival forests
  • Survival functions
  • Uncertainties
  • Variable importance

Fingerprint

Dive into the research topics of 'Evaluating the generalizability and transferability of water distribution deterioration models'. Together they form a unique fingerprint.

Cite this