BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs

Sammie Katt, Hai Nguyen, Frans A. Oliehoek, Christopher Amato

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

7 Downloads (Pure)

Abstract

While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with varied success. This work presents a representation-agnostic formulation of BRL under partially observability, unifying the previous models under one theoretical umbrella. To demonstrate its practical significance we also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks. Under this parameterization, in contrast to previous work, the belief over the state and dynamics is a more scalable inference problem. We choose actions through Monte-Carlo tree search and empirically show that our method is competitive with state-of-the-art BRL methods on small domains while being able to solve much larger ones.

Original languageEnglish
Title of host publicationInternational Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022
PublisherInternational Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Pages723-731
Number of pages9
ISBN (Electronic)978-171385433-3
Publication statusPublished - 2022
Event21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022 - Auckland, Virtual, New Zealand
Duration: 9 May 202213 May 2022

Publication series

NameProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Volume2
ISSN (Print)1548-8403
ISSN (Electronic)1558-2914

Conference

Conference21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022
Country/TerritoryNew Zealand
CityAuckland, Virtual
Period9/05/2213/05/22

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

  • Bayesian RL
  • MCTS
  • POMDP

Fingerprint

Dive into the research topics of 'BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs'. Together they form a unique fingerprint.

Cite this