Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

Dongge Han; Wendelin Böhmer; Michael Wooldridge; Alex Rogers

doi:10.1007/978-3-030-29911-8_7

Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

Dongge Han^*, Wendelin Böhmer, Michael Wooldridge, Alex Rogers

^*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

4 Citations (Scopus)

Abstract

In a multi-agent system, an agent’s optimal policy will typically depend on the policies chosen by others. Therefore, a key issue in multi-agent systems research is that of predicting the behaviours of others, and responding promptly to changes in such behaviours. One obvious possibility is for each agent to broadcast their current intention, for example, the currently executed option in a hierarchical reinforcement learning framework. However, this approach results in inflexibility of agents if options have an extended duration and are dynamic. While adjusting the executed option at each step improves flexibility from a single-agent perspective, frequent changes in options can induce inconsistency between an agent’s actual behaviour and its broadcast intention. In order to balance flexibility and predictability, we propose a dynamic termination Bellman equation that allows the agents to flexibly terminate their options. We evaluate our models empirically on a set of multi-agent pursuit and taxi tasks, and show that our agents learn to adapt flexibly across scenarios that require different termination behaviours.

Original language	English
Title of host publication	PRICAI 2019
Subtitle of host publication	Trends in Artificial Intelligence - 16th Pacific Rim International Conference on Artificial Intelligence, Proceedings
Editors	Abhaya C. Nayak, Alok Sharma
Publisher	Springer
Pages	80-92
Number of pages	13
Volume	11671
ISBN (Print)	9783030299101
DOIs	https://doi.org/10.1007/978-3-030-29911-8_7
Publication status	Published - 2019
Externally published	Yes
Event	16th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2019 - Yanuka Island, Fiji Duration: 26 Aug 2019 → 30 Aug 2019

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11671 LNAI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	16th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2019
Country/Territory	Fiji
City	Yanuka Island
Period	26/08/19 → 30/08/19

Keywords

Hierarchcial reinforcement learning
Multi-agent Learning

Access to Document

10.1007/978-3-030-29911-8_7

Cite this

Han, D., Böhmer, W., Wooldridge, M., & Rogers, A. (2019). Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination. In A. C. Nayak, & A. Sharma (Eds.), PRICAI 2019: Trends in Artificial Intelligence - 16th Pacific Rim International Conference on Artificial Intelligence, Proceedings (Vol. 11671, pp. 80-92). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11671 LNAI). Springer. https://doi.org/10.1007/978-3-030-29911-8_7

Han, Dongge ; Böhmer, Wendelin ; Wooldridge, Michael et al. / Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination. PRICAI 2019: Trends in Artificial Intelligence - 16th Pacific Rim International Conference on Artificial Intelligence, Proceedings. editor / Abhaya C. Nayak ; Alok Sharma. Vol. 11671 Springer, 2019. pp. 80-92 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{a006b88d1d944c318832dc3734c427c4,

title = "Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination",

abstract = "In a multi-agent system, an agent{\textquoteright}s optimal policy will typically depend on the policies chosen by others. Therefore, a key issue in multi-agent systems research is that of predicting the behaviours of others, and responding promptly to changes in such behaviours. One obvious possibility is for each agent to broadcast their current intention, for example, the currently executed option in a hierarchical reinforcement learning framework. However, this approach results in inflexibility of agents if options have an extended duration and are dynamic. While adjusting the executed option at each step improves flexibility from a single-agent perspective, frequent changes in options can induce inconsistency between an agent{\textquoteright}s actual behaviour and its broadcast intention. In order to balance flexibility and predictability, we propose a dynamic termination Bellman equation that allows the agents to flexibly terminate their options. We evaluate our models empirically on a set of multi-agent pursuit and taxi tasks, and show that our agents learn to adapt flexibly across scenarios that require different termination behaviours.",

keywords = "Hierarchcial reinforcement learning, Multi-agent Learning",

author = "Dongge Han and Wendelin B{\"o}hmer and Michael Wooldridge and Alex Rogers",

year = "2019",

doi = "10.1007/978-3-030-29911-8_7",

language = "English",

isbn = "9783030299101",

volume = "11671",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer",

pages = "80--92",

editor = "Nayak, {Abhaya C.} and Alok Sharma",

booktitle = "PRICAI 2019",

note = "16th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2019 ; Conference date: 26-08-2019 Through 30-08-2019",

}

Han, D, Böhmer, W, Wooldridge, M & Rogers, A 2019, Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination. in AC Nayak & A Sharma (eds), PRICAI 2019: Trends in Artificial Intelligence - 16th Pacific Rim International Conference on Artificial Intelligence, Proceedings. vol. 11671, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11671 LNAI, Springer, pp. 80-92, 16th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2019, Yanuka Island, Fiji, 26/08/19. https://doi.org/10.1007/978-3-030-29911-8_7

Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination. / Han, Dongge; Böhmer, Wendelin; Wooldridge, Michael et al.
PRICAI 2019: Trends in Artificial Intelligence - 16th Pacific Rim International Conference on Artificial Intelligence, Proceedings. ed. / Abhaya C. Nayak; Alok Sharma. Vol. 11671 Springer, 2019. p. 80-92 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11671 LNAI).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

AU - Han, Dongge

AU - Böhmer, Wendelin

AU - Wooldridge, Michael

AU - Rogers, Alex

PY - 2019

Y1 - 2019

N2 - In a multi-agent system, an agent’s optimal policy will typically depend on the policies chosen by others. Therefore, a key issue in multi-agent systems research is that of predicting the behaviours of others, and responding promptly to changes in such behaviours. One obvious possibility is for each agent to broadcast their current intention, for example, the currently executed option in a hierarchical reinforcement learning framework. However, this approach results in inflexibility of agents if options have an extended duration and are dynamic. While adjusting the executed option at each step improves flexibility from a single-agent perspective, frequent changes in options can induce inconsistency between an agent’s actual behaviour and its broadcast intention. In order to balance flexibility and predictability, we propose a dynamic termination Bellman equation that allows the agents to flexibly terminate their options. We evaluate our models empirically on a set of multi-agent pursuit and taxi tasks, and show that our agents learn to adapt flexibly across scenarios that require different termination behaviours.

AB - In a multi-agent system, an agent’s optimal policy will typically depend on the policies chosen by others. Therefore, a key issue in multi-agent systems research is that of predicting the behaviours of others, and responding promptly to changes in such behaviours. One obvious possibility is for each agent to broadcast their current intention, for example, the currently executed option in a hierarchical reinforcement learning framework. However, this approach results in inflexibility of agents if options have an extended duration and are dynamic. While adjusting the executed option at each step improves flexibility from a single-agent perspective, frequent changes in options can induce inconsistency between an agent’s actual behaviour and its broadcast intention. In order to balance flexibility and predictability, we propose a dynamic termination Bellman equation that allows the agents to flexibly terminate their options. We evaluate our models empirically on a set of multi-agent pursuit and taxi tasks, and show that our agents learn to adapt flexibly across scenarios that require different termination behaviours.

KW - Hierarchcial reinforcement learning

KW - Multi-agent Learning

UR - http://www.scopus.com/inward/record.url?scp=85072865943&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-29911-8_7

DO - 10.1007/978-3-030-29911-8_7

M3 - Conference contribution

AN - SCOPUS:85072865943

SN - 9783030299101

VL - 11671

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 80

EP - 92

BT - PRICAI 2019

A2 - Nayak, Abhaya C.

A2 - Sharma, Alok

PB - Springer

T2 - 16th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2019

Y2 - 26 August 2019 through 30 August 2019

ER -

Han D, Böhmer W, Wooldridge M, Rogers A. Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination. In Nayak AC, Sharma A, editors, PRICAI 2019: Trends in Artificial Intelligence - 16th Pacific Rim International Conference on Artificial Intelligence, Proceedings. Vol. 11671. Springer. 2019. p. 80-92. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-29911-8_7

Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this