Exploring the Effects of Conditioning Independent Q-Learners on the Sufficient Statistic for Dec-POMDPs

A.V. Mandersloot, F.A. Oliehoek, A.T. Czechowski

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

32 Downloads (Pure)

Abstract

In this study, we investigate the effects of conditioning Independent Q-Learners (IQL) not solely on the individual action-observation history, but additionally on the sufficient plan-time statistic for Decentralized Partially Observable Markov Decision Processes. In doing so, we attempt to address a key shortcoming of IQL, namely that it is likely to converge to a Nash Equilibrium that can be arbitrarily poor. We identify a novel exploration strategy for IQL when it conditions on the sufficient statistic, and furthermore show that sub-optimal equilibria can be escaped consistently by sequencing the decision-making during learning. The practical limitation is the exponential complexity of both the sufficient statistic and the decision rules.
Original languageEnglish
Title of host publicationBNAIC/BeneLearn 2020
EditorsLu Cao, Walter Kosters, Jefrey Lijffijt
PublisherRU Leiden
Pages423-424
Publication statusPublished - 19 Nov 2020
EventBNAIC/BENELEARN 2020 - Leiden, Netherlands
Duration: 19 Nov 202020 Nov 2020

Conference

ConferenceBNAIC/BENELEARN 2020
Country/TerritoryNetherlands
CityLeiden
Period19/11/2020/11/20

Keywords

  • Deep Reinforcement Learning
  • Multi-Agent
  • Partial Observability
  • Decentralized Execution

Fingerprint

Dive into the research topics of 'Exploring the Effects of Conditioning Independent Q-Learners on the Sufficient Statistic for Dec-POMDPs'. Together they form a unique fingerprint.

Cite this