Robust and automatic data cleansing method for short-term load forecasting of distribution feeders

Nathalie Huyghues-Beaufond, Simon Tindemans, Paola Falugi, Mingyang Sun, Goran Strbac

Research output: Contribution to journalArticlepeer-review

18 Citations (Scopus)
218 Downloads (Pure)


Distribution networks are undergoing fundamental changes at medium voltage level. To support growing planning and control decision-making, the need for large numbers of short-term load forecasts has emerged. Data-driven modelling of medium voltage feeders can be affected by (1) data quality issues, namely, large gross errors and missing observations (2) the presence of structural breaks in the data due to occasional network reconfiguration and load transfers. The present work investigates and reports on the effects of advanced data cleansing techniques on forecast accuracy. A hybrid framework to detect and remove outliers in large datasets is proposed; this automatic procedure combines the Tukey labelling rule and the binary segmentation algorithm to cleanse data more efficiently, it is fast and easy to implement. Various approaches for missing value imputation are investigated, including unconditional mean, Hot Deck via k-nearest neighbour and Kalman smoothing. A combination of the automatic detection/removal of outliers and the imputation methods mentioned above are implemented to cleanse time series of 342 medium-voltage feeders. A nested rolling-origin-validation technique is used to evaluate the feed-forward deep neural network models. The proposed data cleansing framework efficiently removes outliers from the data, and the accuracy of forecasts is improved. It is found that Hot Deck (k-NN) imputation performs best in balancing the bias-variance trade-off for short-term forecasting.
Original languageEnglish
Article number114405
Pages (from-to)1-17
Number of pages17
JournalApplied Energy
Publication statusPublished - 2020

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project

Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.


  • Binary segmentation
  • Distribution systems
  • Kalman smoothing
  • Multi-step forecasts
  • Outlier detection


Dive into the research topics of 'Robust and automatic data cleansing method for short-term load forecasting of distribution feeders'. Together they form a unique fingerprint.

Cite this