Abstract
A key part of efficient airport operational planning is to have insight into potential flight delays and cancellations. For airport planners, it is important to obtain flight delay or cancellation predictions with a high degree of certainty, i.e. a high precision. This allows planners to make sound decisions based on these predictions. To obtain such predictions, machine learning classification techniques are often applied. An important issue for classification problems is that of imbalanced class distributions: the number of actually cancelled/delayed flights is low. In general, the imbalance is addressed by resampling the data using one or more sampling techniques. However, resampling does not necessarily correspond to an imbalance ratio that leads to the best classification results. In this paper a systematic approach is presented to deal with imbalanced data for classification problems, while taking into account the preferences of airport planners. A range of feasible imbalance ratios, together with several classification algorithms and sampling techniques, are considered. An optimal imbalance ratio is identified with respect to relevant performance metrics. The approach is illustrated by performing binary classification of flight cancellations and delays at a large European airport. The results show that the highest prediction precision is obtained using a base imbalance ratio, whereas a higher imbalance ratio is needed to obtain the highest F1-score. Specifically, the cancellation prediction performance is increased by up to 243%, while its optimal imbalance ratio does not correspond to resampling. In general, the results underline the need to investigate the influence of varying data imbalance ratios on the performance of classification algorithms.
Original language | English |
---|---|
Title of host publication | 40th Digital Avionics Systems Conference, DASC 2021 - Proceedings |
Subtitle of host publication | Proceedings |
Place of Publication | Piscataway |
Publisher | IEEE |
Number of pages | 10 |
ISBN (Electronic) | 978-1-6654-3420-1 |
ISBN (Print) | 978-1-6654-3421-8 |
DOIs | |
Publication status | Published - 2021 |
Event | 2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC) - Hybrid at San Antonio, United States Duration: 3 Oct 2021 → 7 Oct 2021 Conference number: 40th |
Publication series
Name | AIAA/IEEE Digital Avionics Systems Conference - Proceedings |
---|---|
Volume | 2021-October |
ISSN (Print) | 2155-7195 |
ISSN (Electronic) | 2155-7209 |
Conference
Conference | 2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC) |
---|---|
Abbreviated title | DASC 2021 |
Country/Territory | United States |
City | Hybrid at San Antonio |
Period | 3/10/21 → 7/10/21 |
Keywords
- flight delay
- machine learning
- imbalance
- classification