TY - GEN
T1 - Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination
AU - Han, Dongge
AU - Böhmer, Wendelin
AU - Wooldridge, Michael
AU - Rogers, Alex
PY - 2019
Y1 - 2019
N2 - In a multi-agent system, an agent’s optimal policy will typically depend on the policies chosen by others. Therefore, a key issue in multi-agent systems research is that of predicting the behaviours of others, and responding promptly to changes in such behaviours. One obvious possibility is for each agent to broadcast their current intention, for example, the currently executed option in a hierarchical reinforcement learning framework. However, this approach results in inflexibility of agents if options have an extended duration and are dynamic. While adjusting the executed option at each step improves flexibility from a single-agent perspective, frequent changes in options can induce inconsistency between an agent’s actual behaviour and its broadcast intention. In order to balance flexibility and predictability, we propose a dynamic termination Bellman equation that allows the agents to flexibly terminate their options. We evaluate our models empirically on a set of multi-agent pursuit and taxi tasks, and show that our agents learn to adapt flexibly across scenarios that require different termination behaviours.
AB - In a multi-agent system, an agent’s optimal policy will typically depend on the policies chosen by others. Therefore, a key issue in multi-agent systems research is that of predicting the behaviours of others, and responding promptly to changes in such behaviours. One obvious possibility is for each agent to broadcast their current intention, for example, the currently executed option in a hierarchical reinforcement learning framework. However, this approach results in inflexibility of agents if options have an extended duration and are dynamic. While adjusting the executed option at each step improves flexibility from a single-agent perspective, frequent changes in options can induce inconsistency between an agent’s actual behaviour and its broadcast intention. In order to balance flexibility and predictability, we propose a dynamic termination Bellman equation that allows the agents to flexibly terminate their options. We evaluate our models empirically on a set of multi-agent pursuit and taxi tasks, and show that our agents learn to adapt flexibly across scenarios that require different termination behaviours.
KW - Hierarchcial reinforcement learning
KW - Multi-agent Learning
UR - http://www.scopus.com/inward/record.url?scp=85072865943&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-29911-8_7
DO - 10.1007/978-3-030-29911-8_7
M3 - Conference contribution
AN - SCOPUS:85072865943
SN - 9783030299101
VL - 11671
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 80
EP - 92
BT - PRICAI 2019
A2 - Nayak, Abhaya C.
A2 - Sharma, Alok
PB - Springer
T2 - 16th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2019
Y2 - 26 August 2019 through 30 August 2019
ER -