Abstract
While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with varied success. This work presents a representation-agnostic formulation of BRL under partially observability, unifying the previous models under one theoretical umbrella. To demonstrate its practical significance we also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks. Under this parameterization, in contrast to previous work, the belief over the state and dynamics is a more scalable inference problem. We choose actions through Monte-Carlo tree search and empirically show that our method is competitive with state-of-the-art BRL methods on small domains while being able to solve much larger ones.
Original language | English |
---|---|
Title of host publication | International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022 |
Publisher | International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS) |
Pages | 723-731 |
Number of pages | 9 |
ISBN (Electronic) | 978-171385433-3 |
Publication status | Published - 2022 |
Event | 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022 - Auckland, Virtual, New Zealand Duration: 9 May 2022 → 13 May 2022 |
Publication series
Name | Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS |
---|---|
Volume | 2 |
ISSN (Print) | 1548-8403 |
ISSN (Electronic) | 1558-2914 |
Conference
Conference | 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022 |
---|---|
Country/Territory | New Zealand |
City | Auckland, Virtual |
Period | 9/05/22 → 13/05/22 |
Bibliographical note
Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-careOtherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.
Keywords
- Bayesian RL
- MCTS
- POMDP