UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

Tarun Gupta; Anuj Mahajan; Bei Peng; Wendelin Böhmer; Shimon Whiteson

UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

Tarun Gupta, Anuj Mahajan, Bei Peng, Wendelin Böhmer, Shimon Whiteson

Algorithmics

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

51 Downloads (Pure)

Abstract

VDN and QMIX are two popular value-based algorithms for cooperative MARL that learn a centralized action value function as a monotonic mixing of per-agent utilities. While this enables easy decentralization of the learned policy, the restricted joint action value function can prevent them from solving tasks that require significant coordination between agents at a given timestep. We show that this problem can be overcome by improving the joint exploration of all agents during training. Specifically, we propose a novel MARL approach called Universal Value Exploration (UneVEn) that learns a set of related tasks simultaneously with a linear decomposition of universal successor features. With the policies of already solved related tasks, the joint exploration process of all agents can be improved to help them achieve better coordination. Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.

Original language	English
Title of host publication	Proceedings of the International Conference on Machine Learning (ICML)
Editors	Marina Meila, Tong Zhang
Pages	3930-3941
Number of pages	12
Volume	139
Publication status	Published - 2021
Event	International Conference on Machine Learning: 2021 - Duration: 18 Jul 2021 → 24 Jul 2021 Conference number: 38th https://icml.cc/Conferences/2021

Publication series

Name	Proceedings of Machine Learning Research
Volume	PMLR 139
ISSN (Electronic)	2640-3498

Conference

Conference	International Conference on Machine Learning
Abbreviated title	ICML
Period	18/07/21 → 24/07/21
Internet address	https://icml.cc/Conferences/2021

Access to Document

gupta21a(1)Final published version, 2.75 MB

Cite this

@inproceedings{97fa6f7ac8864fa4ade865326092dfb0,

title = "UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning",

abstract = "VDN and QMIX are two popular value-based algorithms for cooperative MARL that learn a centralized action value function as a monotonic mixing of per-agent utilities. While this enables easy decentralization of the learned policy, the restricted joint action value function can prevent them from solving tasks that require significant coordination between agents at a given timestep. We show that this problem can be overcome by improving the joint exploration of all agents during training. Specifically, we propose a novel MARL approach called Universal Value Exploration (UneVEn) that learns a set of related tasks simultaneously with a linear decomposition of universal successor features. With the policies of already solved related tasks, the joint exploration process of all agents can be improved to help them achieve better coordination. Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail. ",

author = "Tarun Gupta and Anuj Mahajan and Bei Peng and Wendelin B{\"o}hmer and Shimon Whiteson",

year = "2021",

language = "English",

volume = "139",

series = "Proceedings of Machine Learning Research",

pages = "3930--3941",

editor = "Meila, { Marina} and Tong Zhang",

booktitle = "Proceedings of the International Conference on Machine Learning (ICML)",

note = "International Conference on Machine Learning : 2021, ICML ; Conference date: 18-07-2021 Through 24-07-2021",

url = "https://icml.cc/Conferences/2021",

}

Gupta, T, Mahajan, A, Peng, B, Böhmer, W & Whiteson, S 2021, UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning. in M Meila & T Zhang (eds), Proceedings of the International Conference on Machine Learning (ICML). vol. 139, Proceedings of Machine Learning Research, vol. PMLR 139, pp. 3930-3941, International Conference on Machine Learning, 18/07/21.

UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning. / Gupta, Tarun; Mahajan, Anuj; Peng, Bei et al.
Proceedings of the International Conference on Machine Learning (ICML). ed. / Marina Meila; Tong Zhang. Vol. 139 2021. p. 3930-3941 (Proceedings of Machine Learning Research; Vol. PMLR 139).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

AU - Gupta, Tarun

AU - Mahajan, Anuj

AU - Peng, Bei

AU - Böhmer, Wendelin

AU - Whiteson, Shimon

N1 - Conference code: 38th

PY - 2021

Y1 - 2021

N2 - VDN and QMIX are two popular value-based algorithms for cooperative MARL that learn a centralized action value function as a monotonic mixing of per-agent utilities. While this enables easy decentralization of the learned policy, the restricted joint action value function can prevent them from solving tasks that require significant coordination between agents at a given timestep. We show that this problem can be overcome by improving the joint exploration of all agents during training. Specifically, we propose a novel MARL approach called Universal Value Exploration (UneVEn) that learns a set of related tasks simultaneously with a linear decomposition of universal successor features. With the policies of already solved related tasks, the joint exploration process of all agents can be improved to help them achieve better coordination. Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.

AB - VDN and QMIX are two popular value-based algorithms for cooperative MARL that learn a centralized action value function as a monotonic mixing of per-agent utilities. While this enables easy decentralization of the learned policy, the restricted joint action value function can prevent them from solving tasks that require significant coordination between agents at a given timestep. We show that this problem can be overcome by improving the joint exploration of all agents during training. Specifically, we propose a novel MARL approach called Universal Value Exploration (UneVEn) that learns a set of related tasks simultaneously with a linear decomposition of universal successor features. With the policies of already solved related tasks, the joint exploration process of all agents can be improved to help them achieve better coordination. Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.

UR - http://proceedings.mlr.press/v139/gupta21a/gupta21a.pdf

M3 - Conference contribution

VL - 139

T3 - Proceedings of Machine Learning Research

SP - 3930

EP - 3941

BT - Proceedings of the International Conference on Machine Learning (ICML)

A2 - Meila, Marina

A2 - Zhang, Tong

T2 - International Conference on Machine Learning

Y2 - 18 July 2021 through 24 July 2021

ER -

UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this