Considering Airport Planners’ Preferences and Imbalanced Datasets when Predicting Flight Delays and Cancellations

Rik Hendrickx, Mike Zoutendijk, Mihaela Mitici, Jeffrey Schäfer

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

4 Downloads (Pure)


A key part of efficient airport operational planning is to have insight into potential flight delays and cancellations. For airport planners, it is important to obtain flight delay or cancellation predictions with a high degree of certainty, i.e. a high precision. This allows planners to make sound decisions based on these predictions. To obtain such predictions, machine learning classification techniques are often applied. An important issue for classification problems is that of imbalanced class distributions: the number of actually cancelled/delayed flights is low. In general, the imbalance is addressed by resampling the data using one or more sampling techniques. However, resampling does not necessarily correspond to an imbalance ratio that leads to the best classification results. In this paper a systematic approach is presented to deal with imbalanced data for classification problems, while taking into account the preferences of airport planners. A range of feasible imbalance ratios, together with several classification algorithms and sampling techniques, are considered. An optimal imbalance ratio is identified with respect to relevant performance metrics. The approach is illustrated by performing binary classification of flight cancellations and delays at a large European airport. The results show that the highest prediction precision is obtained using a base imbalance ratio, whereas a higher imbalance ratio is needed to obtain the highest F1-score. Specifically, the cancellation prediction performance is increased by up to 243%, while its optimal imbalance ratio does not correspond to resampling. In general, the results underline the need to investigate the influence of varying data imbalance ratios on the performance of classification algorithms.
Original languageEnglish
Title of host publication2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC)
Subtitle of host publicationProceedings
Place of PublicationPiscataway
Number of pages10
ISBN (Electronic)978-1-6654-3420-1
ISBN (Print)978-1-6654-3421-8
Publication statusPublished - 2021
Event2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC) - Hybrid at San Antonio, United States
Duration: 3 Oct 20217 Oct 2021
Conference number: 40th


Conference2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC)
Abbreviated titleDASC 2021
CountryUnited States
CityHybrid at San Antonio


  • flight delay
  • machine learning
  • imbalance
  • classification


Dive into the research topics of 'Considering Airport Planners’ Preferences and Imbalanced Datasets when Predicting Flight Delays and Cancellations'. Together they form a unique fingerprint.

Cite this