Abstract
In this study, we investigate the effects of conditioning Independent Q-Learners (IQL) not solely on the individual action-observation history, but additionally on the sufficient plan-time statistic for Decentralized Partially Observable Markov Decision Processes. In doing so, we attempt to address a key shortcoming of IQL, namely that it is likely to converge to a Nash Equilibrium that can be arbitrarily poor. We identify a novel exploration strategy for IQL when it conditions on the sufficient statistic, and furthermore show that sub-optimal equilibria can be escaped consistently by sequencing the decision-making during learning. The practical limitation is the exponential complexity of both the sufficient statistic and the decision rules.
| Original language | English |
|---|---|
| Title of host publication | BNAIC/BeneLearn 2020 |
| Editors | Lu Cao, Walter Kosters, Jefrey Lijffijt |
| Publisher | RU Leiden |
| Pages | 423-424 |
| Publication status | Published - 19 Nov 2020 |
| Event | BNAIC/BENELEARN 2020 - Leiden, Netherlands Duration: 19 Nov 2020 → 20 Nov 2020 |
Conference
| Conference | BNAIC/BENELEARN 2020 |
|---|---|
| Country/Territory | Netherlands |
| City | Leiden |
| Period | 19/11/20 → 20/11/20 |
Keywords
- Deep Reinforcement Learning
- Multi-Agent
- Partial Observability
- Decentralized Execution
Fingerprint
Dive into the research topics of 'Exploring the Effects of Conditioning Independent Q-Learners on the Sufficient Statistic for Dec-POMDPs'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver