Determining Optimal Conflict Avoidance Manoeuvres At High Densities With Reinforcement Learning

M.J. Ribeiro; J. Ellerbroek; J.M. Hoekstra

Determining Optimal Conflict Avoidance Manoeuvres At High Densities With Reinforcement Learning

M.J. Ribeiro, J. Ellerbroek, J.M. Hoekstra

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific

62 Downloads (Pure)

Abstract

The use of drones for applications such as package delivery, in an urban setting, would result in traffic densities that are orders of magnitude higher than any observed in manned aviation. Current geometric resolution models have proven to be very efficient at relatively moderate densities. However, at higher densities, performance is hindered by the unpredictable emergent behaviour from neighbouring aircraft. In this paper, we use a hybrid solution between existing geometric resolution approaches and reinforcement learning (RL), directed at improving conflict resolution performance at high densities. We resort to a Deep Deterministic Policy Gradient (DDPG) model to improve the behaviour of the Modified Voltage Potential (MVP) geometric conflict resolution method. By default, the MVP method generates avoidance manoeuvres of a geometrically-defined type, using a fixed look-ahead time. In the current study, we instead aim to use RL to determine the values for these variables, based on intruder position and traffic density. The analysis in this paper specifically addresses the difficulty of training algorithms in a cooperative multi-agent case to converge to optimal values. We prove that finding the right representation of state/rewards in a nonstationary environment is non-trivial and highly influences the learning process. Finally, we show that a variation of resolution manoeuvres can improve the safety of several scenarios at high traffic densities.

Original language	English
Title of host publication	10th SESAR Innovation Days
Number of pages	8
Publication status	Published - 2020
Event	10th SESAR Innovation Days - Virtual/online event due to COVID-19 Duration: 7 Dec 2020 → 10 Dec 2020

Conference

Conference	10th SESAR Innovation Days
Period	7/12/20 → 10/12/20

Bibliographical note

Virtual/online event due to COVID-19

Keywords

Conflict Detection and Resolution (CD&R)
Reinforcement Leaning (RL)
), Deep Deterministic Policy Gradient (DDPG)
U-Space
Unmanned Traffic Management (UTM)
Modified Voltage Potential (MVP)
BlueSky
ATC Simulator

Access to Document

SIDs_2020_paper_60redFinal published version, 300 KB

Cite this

@inproceedings{31d670ae17994e2c8f02b4d06bde186d,

title = "Determining Optimal Conflict Avoidance Manoeuvres At High Densities With Reinforcement Learning",

abstract = "The use of drones for applications such as package delivery, in an urban setting, would result in traffic densities that are orders of magnitude higher than any observed in manned aviation. Current geometric resolution models have proven to be very efficient at relatively moderate densities. However, at higher densities, performance is hindered by the unpredictable emergent behaviour from neighbouring aircraft. In this paper, we use a hybrid solution between existing geometric resolution approaches and reinforcement learning (RL), directed at improving conflict resolution performance at high densities. We resort to a Deep Deterministic Policy Gradient (DDPG) model to improve the behaviour of the Modified Voltage Potential (MVP) geometric conflict resolution method. By default, the MVP method generates avoidance manoeuvres of a geometrically-defined type, using a fixed look-ahead time. In the current study, we instead aim to use RL to determine the values for these variables, based on intruder position and traffic density. The analysis in this paper specifically addresses the difficulty of training algorithms in a cooperative multi-agent case to converge to optimal values. We prove that finding the right representation of state/rewards in a nonstationary environment is non-trivial and highly influences the learning process. Finally, we show that a variation of resolution manoeuvres can improve the safety of several scenarios at high traffic densities. ",

keywords = "Conflict Detection and Resolution (CD&R), Reinforcement Leaning (RL), ), Deep Deterministic Policy Gradient (DDPG), U-Space, Unmanned Traffic Management (UTM), Modified Voltage Potential (MVP), BlueSky, ATC Simulator",

author = "M.J. Ribeiro and J. Ellerbroek and J.M. Hoekstra",

note = "Virtual/online event due to COVID-19 ; 10th SESAR Innovation Days ; Conference date: 07-12-2020 Through 10-12-2020",

year = "2020",

language = "English",

booktitle = "10th SESAR Innovation Days",

}

TY - GEN

T1 - Determining Optimal Conflict Avoidance Manoeuvres At High Densities With Reinforcement Learning

AU - Ribeiro, M.J.

AU - Ellerbroek, J.

AU - Hoekstra, J.M.

N1 - Virtual/online event due to COVID-19

PY - 2020

Y1 - 2020

N2 - The use of drones for applications such as package delivery, in an urban setting, would result in traffic densities that are orders of magnitude higher than any observed in manned aviation. Current geometric resolution models have proven to be very efficient at relatively moderate densities. However, at higher densities, performance is hindered by the unpredictable emergent behaviour from neighbouring aircraft. In this paper, we use a hybrid solution between existing geometric resolution approaches and reinforcement learning (RL), directed at improving conflict resolution performance at high densities. We resort to a Deep Deterministic Policy Gradient (DDPG) model to improve the behaviour of the Modified Voltage Potential (MVP) geometric conflict resolution method. By default, the MVP method generates avoidance manoeuvres of a geometrically-defined type, using a fixed look-ahead time. In the current study, we instead aim to use RL to determine the values for these variables, based on intruder position and traffic density. The analysis in this paper specifically addresses the difficulty of training algorithms in a cooperative multi-agent case to converge to optimal values. We prove that finding the right representation of state/rewards in a nonstationary environment is non-trivial and highly influences the learning process. Finally, we show that a variation of resolution manoeuvres can improve the safety of several scenarios at high traffic densities.

AB - The use of drones for applications such as package delivery, in an urban setting, would result in traffic densities that are orders of magnitude higher than any observed in manned aviation. Current geometric resolution models have proven to be very efficient at relatively moderate densities. However, at higher densities, performance is hindered by the unpredictable emergent behaviour from neighbouring aircraft. In this paper, we use a hybrid solution between existing geometric resolution approaches and reinforcement learning (RL), directed at improving conflict resolution performance at high densities. We resort to a Deep Deterministic Policy Gradient (DDPG) model to improve the behaviour of the Modified Voltage Potential (MVP) geometric conflict resolution method. By default, the MVP method generates avoidance manoeuvres of a geometrically-defined type, using a fixed look-ahead time. In the current study, we instead aim to use RL to determine the values for these variables, based on intruder position and traffic density. The analysis in this paper specifically addresses the difficulty of training algorithms in a cooperative multi-agent case to converge to optimal values. We prove that finding the right representation of state/rewards in a nonstationary environment is non-trivial and highly influences the learning process. Finally, we show that a variation of resolution manoeuvres can improve the safety of several scenarios at high traffic densities.

KW - Conflict Detection and Resolution (CD&R)

KW - Reinforcement Leaning (RL)

KW - ), Deep Deterministic Policy Gradient (DDPG)

KW - U-Space

KW - Unmanned Traffic Management (UTM)

KW - Modified Voltage Potential (MVP)

KW - BlueSky

KW - ATC Simulator

M3 - Conference contribution

BT - 10th SESAR Innovation Days

T2 - 10th SESAR Innovation Days

Y2 - 7 December 2020 through 10 December 2020

ER -

Determining Optimal Conflict Avoidance Manoeuvres At High Densities With Reinforcement Learning

Abstract

Conference

Bibliographical note

Keywords

Access to Document

Fingerprint

Cite this