Distributed Conflict Resolution at High Traffic Densities with Reinforcement Learning

Marta Ribeiro; Joost Ellerbroek; Jacco Hoekstra

doi:10.3390/aerospace9090472

Distributed Conflict Resolution at High Traffic Densities with Reinforcement Learning

Marta Ribeiro^*, Joost Ellerbroek, Jacco Hoekstra

^*Corresponding author for this work

Control & Simulation

Research output: Contribution to journal › Article › Scientific › peer-review

3 Citations (Scopus)

31 Downloads (Pure)

Abstract

Future operations involving drones are expected to result in traffic densities that are orders of magnitude higher than any observed in manned aviation. Current geometric conflict resolution (CR) methods have proven to be very efficient at relatively moderate densities. However, at higher densities, performance is hindered by the unpredictable emergent behaviour from neighbouring aircraft. Reinforcement learning (RL) techniques are often capable of identifying emerging patterns through training in the environment. Although some work has started introducing RL to resolve conflicts and ensure separation between aircraft, it is not clear how to employ these methods with a higher number of aircraft, and whether these can compare to or even surpass the performance of current CR geometric methods. In this work, we employ an RL method for distributed conflict resolution; the method is completely responsible for guaranteeing minimum separation of all aircraft during operation. Two different action formulations are tested: (1) where the RL method controls heading, and speed variation; (2) where the RL method controls heading, speed, and altitude variation. The final safety values are directly compared to a state-of-the-art distributed CR algorithm, the Modified Voltage Potential (MVP) method. Although, overall, the RL method is not as efficient as MVP in reducing the total number of losses of minimum separation, its actions help identify favourable patterns to avoid conflicts. The RL method has a more preventive behaviour, defending in advance against nearby neighbouring aircraft not yet in conflict, and head-on conflicts while intruders are still far away.

Original language	English
Article number	472
Number of pages	22
Journal	Aerospace
Volume	9
Issue number	9
DOIs	https://doi.org/10.3390/aerospace9090472
Publication status	Published - 2022

Bibliographical note

https://github.com/TUDelft-CNS-ATM/bluesky

Keywords

air traffic control (ATC)
BlueSky ATC simulator
conflict detection and resolution (CD&ampR)
modified voltage potential (MVP)
reinforcementlearning (RL)
self-separation
soft actor–critic (SAC)
U-space
velocity obstacles (VO)

Access to Document

10.3390/aerospace9090472

aerospace-09-00472Final published version, 1.9 MBLicence: CC BY

Cite this

@article{b46fe4f3aea746c78c3b48133bb651d5,

title = "Distributed Conflict Resolution at High Traffic Densities with Reinforcement Learning",

abstract = "Future operations involving drones are expected to result in traffic densities that are orders of magnitude higher than any observed in manned aviation. Current geometric conflict resolution (CR) methods have proven to be very efficient at relatively moderate densities. However, at higher densities, performance is hindered by the unpredictable emergent behaviour from neighbouring aircraft. Reinforcement learning (RL) techniques are often capable of identifying emerging patterns through training in the environment. Although some work has started introducing RL to resolve conflicts and ensure separation between aircraft, it is not clear how to employ these methods with a higher number of aircraft, and whether these can compare to or even surpass the performance of current CR geometric methods. In this work, we employ an RL method for distributed conflict resolution; the method is completely responsible for guaranteeing minimum separation of all aircraft during operation. Two different action formulations are tested: (1) where the RL method controls heading, and speed variation; (2) where the RL method controls heading, speed, and altitude variation. The final safety values are directly compared to a state-of-the-art distributed CR algorithm, the Modified Voltage Potential (MVP) method. Although, overall, the RL method is not as efficient as MVP in reducing the total number of losses of minimum separation, its actions help identify favourable patterns to avoid conflicts. The RL method has a more preventive behaviour, defending in advance against nearby neighbouring aircraft not yet in conflict, and head-on conflicts while intruders are still far away.",

keywords = "air traffic control (ATC), BlueSky ATC simulator, conflict detection and resolution (CD&ampR), modified voltage potential (MVP), reinforcementlearning (RL), self-separation, soft actor–critic (SAC), U-space, velocity obstacles (VO)",

author = "Marta Ribeiro and Joost Ellerbroek and Jacco Hoekstra",

note = "https://github.com/TUDelft-CNS-ATM/bluesky",

year = "2022",

doi = "10.3390/aerospace9090472",

language = "English",

volume = "9",

journal = "Aerospace",

issn = "2226-4310",

publisher = "MDPI",

number = "9",

}

TY - JOUR

T1 - Distributed Conflict Resolution at High Traffic Densities with Reinforcement Learning

AU - Ribeiro, Marta

AU - Ellerbroek, Joost

AU - Hoekstra, Jacco

N1 - https://github.com/TUDelft-CNS-ATM/bluesky

PY - 2022

Y1 - 2022

N2 - Future operations involving drones are expected to result in traffic densities that are orders of magnitude higher than any observed in manned aviation. Current geometric conflict resolution (CR) methods have proven to be very efficient at relatively moderate densities. However, at higher densities, performance is hindered by the unpredictable emergent behaviour from neighbouring aircraft. Reinforcement learning (RL) techniques are often capable of identifying emerging patterns through training in the environment. Although some work has started introducing RL to resolve conflicts and ensure separation between aircraft, it is not clear how to employ these methods with a higher number of aircraft, and whether these can compare to or even surpass the performance of current CR geometric methods. In this work, we employ an RL method for distributed conflict resolution; the method is completely responsible for guaranteeing minimum separation of all aircraft during operation. Two different action formulations are tested: (1) where the RL method controls heading, and speed variation; (2) where the RL method controls heading, speed, and altitude variation. The final safety values are directly compared to a state-of-the-art distributed CR algorithm, the Modified Voltage Potential (MVP) method. Although, overall, the RL method is not as efficient as MVP in reducing the total number of losses of minimum separation, its actions help identify favourable patterns to avoid conflicts. The RL method has a more preventive behaviour, defending in advance against nearby neighbouring aircraft not yet in conflict, and head-on conflicts while intruders are still far away.

AB - Future operations involving drones are expected to result in traffic densities that are orders of magnitude higher than any observed in manned aviation. Current geometric conflict resolution (CR) methods have proven to be very efficient at relatively moderate densities. However, at higher densities, performance is hindered by the unpredictable emergent behaviour from neighbouring aircraft. Reinforcement learning (RL) techniques are often capable of identifying emerging patterns through training in the environment. Although some work has started introducing RL to resolve conflicts and ensure separation between aircraft, it is not clear how to employ these methods with a higher number of aircraft, and whether these can compare to or even surpass the performance of current CR geometric methods. In this work, we employ an RL method for distributed conflict resolution; the method is completely responsible for guaranteeing minimum separation of all aircraft during operation. Two different action formulations are tested: (1) where the RL method controls heading, and speed variation; (2) where the RL method controls heading, speed, and altitude variation. The final safety values are directly compared to a state-of-the-art distributed CR algorithm, the Modified Voltage Potential (MVP) method. Although, overall, the RL method is not as efficient as MVP in reducing the total number of losses of minimum separation, its actions help identify favourable patterns to avoid conflicts. The RL method has a more preventive behaviour, defending in advance against nearby neighbouring aircraft not yet in conflict, and head-on conflicts while intruders are still far away.

KW - air traffic control (ATC)

KW - BlueSky ATC simulator

KW - conflict detection and resolution (CD&ampR)

KW - modified voltage potential (MVP)

KW - reinforcementlearning (RL)

KW - self-separation

KW - soft actor–critic (SAC)

KW - U-space

KW - velocity obstacles (VO)

UR - http://www.scopus.com/inward/record.url?scp=85138516918&partnerID=8YFLogxK

U2 - 10.3390/aerospace9090472

DO - 10.3390/aerospace9090472

M3 - Article

AN - SCOPUS:85138516918

SN - 2226-4310

VL - 9

JO - Aerospace

JF - Aerospace

IS - 9

M1 - 472

ER -

Distributed Conflict Resolution at High Traffic Densities with Reinforcement Learning

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Bluesky software: underlying the publication “Distributed Conflict Resolution at High Traffic Densities with Reinforcement Learning”

Cite this

Distributed Conflict Resolution at High Traffic Densities with Reinforcement Learning

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Datasets

Bluesky software: underlying the publication “Distributed Conflict Resolution at High Traffic Densities with Reinforcement Learning”

Cite this