Policy Analysis of Safe Vertical Manoeuvring using Reinforcement Learning: Identifying when to Act and when to stay Idle

Research output: Contribution to conferencePaperpeer-review

21 Downloads (Pure)


The number of unmanned aircraft operating in the airspace is expected to grow exponentially during the next decades. This will likely lead to traffic densities that are higher than those currently observed in civil and general aviation, and might require both a different airspace structure compared to conventional aviation, as well as different conflict resolution methods. One of the main disadvantages of analytical conflict resolution methods, in high-traffic density scenarios, is that they can cause instabilities of the airspace due to a domino effect of secondary conflicts. Therefore, many studies have also investigated other methods of conflict resolution, such as Deep Reinforcement Learning, which have shown positive results, but tend to be hard to explain due to their black-box nature. This paper investigates if it is possible to explain the behaviour of a Soft Actor-Critic model, trained for resolving vertical conflicts in a layered urban airspace, by interpreting the policy through a heat map of the selected actions. It was found that the model actively changes its policy depending on the degrees of freedom and has a tendency to adopt preventive behaviour on top of conflict resolution. This behaviour can be directly linked to a decrease in secondary conflicts when compared to analytical methods and can potentially be incorporated into these methods to improve them while maintaining explainability.
Original languageEnglish
Number of pages9
Publication statusPublished - 2023
Event13th SESAR Innovation Days - Sevilla, Spain
Duration: 27 Nov 202330 Nov 2023
Conference number: 13


Conference13th SESAR Innovation Days


  • Air Traffic Control
  • Unmanned Traffic Management
  • Reinforcement Learning
  • Policy Analysis
  • Artificial Intelligence
  • Explainable AI


Dive into the research topics of 'Policy Analysis of Safe Vertical Manoeuvring using Reinforcement Learning: Identifying when to Act and when to stay Idle'. Together they form a unique fingerprint.

Cite this