Difference Rewards Policy Gradients

Jacopo Castellini, F.A. Oliehoek, Sam Devlin, Rahul Savani

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

2 Downloads (Pure)

Abstract

Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent’s contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies when the reward function is known. By differencing the reward function directly, Dr.Reinforce avoids difficulties associated with learning the 푄-function as done by Counterfactual Multiagent Policy Gradients (COMA), a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the effectiveness of a version of Dr.Reinforce that learns a reward network that is used to estimate the difference rewards.
Original languageEnglish
Title of host publicationProceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems
Place of PublicationRichland, SC
PublisherInternational Foundation for Autonomous Agents and Multiagent Systems
Pages1475-1377
Number of pages3
ISBN (Electronic)9781450383073
Publication statusPublished - 2021
Event20th International Conference on Autonomous Agentsand Multiagent Systems - Virtual/online event due to COVID-19
Duration: 3 May 20217 May 2021
Conference number: 20

Publication series

NameAAMAS '21
PublisherInternational Foundation for Autonomous Agents and Multiagent Systems
ISSN (Electronic)2523-5699

Conference

Conference20th International Conference on Autonomous Agentsand Multiagent Systems
Abbreviated titleAAMAS 2021
Period3/05/217/05/21

Keywords

  • Multi-Agent Reinforcement Learning
  • Policy Gradients
  • Difference Rewards
  • Multi-Agent Credit Assignment
  • Reward Learning

Fingerprint

Dive into the research topics of 'Difference Rewards Policy Gradients'. Together they form a unique fingerprint.

Cite this