Difference Rewards Policy Gradients

Jacopo Castellini; F.A. Oliehoek; Sam Devlin; Rahul Savani

Difference Rewards Policy Gradients

Jacopo Castellini, F.A. Oliehoek, Sam Devlin, Rahul Savani

Interactive Intelligence

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

35 Downloads (Pure)

Abstract

Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent’s contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies when the reward function is known. By differencing the reward function directly, Dr.Reinforce avoids difficulties associated with learning the 푄-function as done by Counterfactual Multiagent Policy Gradients (COMA), a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the effectiveness of a version of Dr.Reinforce that learns a reward network that is used to estimate the difference rewards.

Original language	English
Title of host publication	Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems
Place of Publication	Richland, SC
Publisher	International Foundation for Autonomous Agents and Multiagent Systems
Pages	1463-1465
Number of pages	3
ISBN (Electronic)	9781450383073
Publication status	Published - 2021
Event	20th International Conference on Autonomous Agentsand Multiagent Systems - Virtual/online event due to COVID-19 Duration: 3 May 2021 → 7 May 2021 Conference number: 20

Publication series

Name	AAMAS '21
Publisher	International Foundation for Autonomous Agents and Multiagent Systems
ISSN (Electronic)	2523-5699

Conference

Conference	20th International Conference on Autonomous Agentsand Multiagent Systems
Abbreviated title	AAMAS 2021
Period	3/05/21 → 7/05/21

Keywords

Multi-Agent Reinforcement Learning
Policy Gradients
Difference Rewards
Multi-Agent Credit Assignment
Reward Learning

Access to Document

p1475Final published version, 1.43 MB

http://www.ifaamas.org/Proceedings/aamas2021/pdfs/p1475.pdf

Cite this

@inproceedings{2721bb5158c347f1a6d2c18c24bc1f60,

title = "Difference Rewards Policy Gradients",

abstract = "Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent{\textquoteright}s contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies when the reward function is known. By differencing the reward function directly, Dr.Reinforce avoids difficulties associated with learning the 푄-function as done by Counterfactual Multiagent Policy Gradients (COMA), a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the effectiveness of a version of Dr.Reinforce that learns a reward network that is used to estimate the difference rewards. ",

keywords = "Multi-Agent Reinforcement Learning, Policy Gradients, Difference Rewards, Multi-Agent Credit Assignment, Reward Learning",

author = "Jacopo Castellini and F.A. Oliehoek and Sam Devlin and Rahul Savani",

year = "2021",

language = "English",

series = "AAMAS '21",

publisher = "International Foundation for Autonomous Agents and Multiagent Systems",

pages = "1463--1465",

booktitle = "Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems",

note = "20th International Conference on Autonomous Agentsand Multiagent Systems, AAMAS 2021 ; Conference date: 03-05-2021 Through 07-05-2021",

}

Castellini, J, Oliehoek, FA, Devlin, S & Savani, R 2021, Difference Rewards Policy Gradients. in Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. AAMAS '21, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, pp. 1463-1465, 20th International Conference on Autonomous Agentsand Multiagent Systems, 3/05/21. <http://www.ifaamas.org/Proceedings/aamas2021/pdfs/p1475.pdf>

Difference Rewards Policy Gradients. / Castellini, Jacopo; Oliehoek, F.A.; Devlin, Sam et al.
Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems, 2021. p. 1463-1465 (AAMAS '21).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Difference Rewards Policy Gradients

AU - Castellini, Jacopo

AU - Oliehoek, F.A.

AU - Devlin, Sam

AU - Savani, Rahul

N1 - Conference code: 20

PY - 2021

Y1 - 2021

N2 - Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent’s contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies when the reward function is known. By differencing the reward function directly, Dr.Reinforce avoids difficulties associated with learning the 푄-function as done by Counterfactual Multiagent Policy Gradients (COMA), a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the effectiveness of a version of Dr.Reinforce that learns a reward network that is used to estimate the difference rewards.

AB - Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent’s contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies when the reward function is known. By differencing the reward function directly, Dr.Reinforce avoids difficulties associated with learning the 푄-function as done by Counterfactual Multiagent Policy Gradients (COMA), a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the effectiveness of a version of Dr.Reinforce that learns a reward network that is used to estimate the difference rewards.

KW - Multi-Agent Reinforcement Learning

KW - Policy Gradients

KW - Difference Rewards

KW - Multi-Agent Credit Assignment

KW - Reward Learning

UR - http://www.scopus.com/inward/record.url?scp=85110404945&partnerID=8YFLogxK

M3 - Conference contribution

T3 - AAMAS '21

SP - 1463

EP - 1465

BT - Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems

PB - International Foundation for Autonomous Agents and Multiagent Systems

CY - Richland, SC

T2 - 20th International Conference on Autonomous Agentsand Multiagent Systems

Y2 - 3 May 2021 through 7 May 2021

ER -

Difference Rewards Policy Gradients

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Best Paper Award

Cite this

Difference Rewards Policy Gradients

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Prizes

Best Paper Award

Cite this