BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs

Sammie Katt; Hai Nguyen; Frans A. Oliehoek; Christopher Amato

BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs

Sammie Katt, Hai Nguyen, Frans A. Oliehoek, Christopher Amato

Interactive Intelligence

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

7 Downloads (Pure)

Abstract

While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with varied success. This work presents a representation-agnostic formulation of BRL under partially observability, unifying the previous models under one theoretical umbrella. To demonstrate its practical significance we also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks. Under this parameterization, in contrast to previous work, the belief over the state and dynamics is a more scalable inference problem. We choose actions through Monte-Carlo tree search and empirically show that our method is competitive with state-of-the-art BRL methods on small domains while being able to solve much larger ones.

Original language	English
Title of host publication	International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022
Publisher	International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Pages	723-731
Number of pages	9
ISBN (Electronic)	978-171385433-3
Publication status	Published - 2022
Event	21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022 - Auckland, Virtual, New Zealand Duration: 9 May 2022 → 13 May 2022

Publication series

Name	Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Volume	2
ISSN (Print)	1548-8403
ISSN (Electronic)	1558-2914

Conference

Conference	21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022
Country/Territory	New Zealand
City	Auckland, Virtual
Period	9/05/22 → 13/05/22

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

Bayesian RL
MCTS
POMDP

Access to Document

3535850.3535932Final published version, 1.96 MB

Cite this

Katt, S., Nguyen, H., Oliehoek, F. A., & Amato, C. (2022). BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs. In International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022 (pp. 723-731). (Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS; Vol. 2). International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS).

Katt, Sammie ; Nguyen, Hai ; Oliehoek, Frans A. et al. / BADDr : Bayes-Adaptive Deep Dropout RL for POMDPs. International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), 2022. pp. 723-731 (Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS).

@inproceedings{4f6dd8ebf5a94facb21c6639cf98b3eb,

title = "BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs",

abstract = "While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with varied success. This work presents a representation-agnostic formulation of BRL under partially observability, unifying the previous models under one theoretical umbrella. To demonstrate its practical significance we also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks. Under this parameterization, in contrast to previous work, the belief over the state and dynamics is a more scalable inference problem. We choose actions through Monte-Carlo tree search and empirically show that our method is competitive with state-of-the-art BRL methods on small domains while being able to solve much larger ones.",

keywords = "Bayesian RL, MCTS, POMDP",

author = "Sammie Katt and Hai Nguyen and Oliehoek, {Frans A.} and Christopher Amato",

note = "Green Open Access added to TU Delft Institutional Repository {\textquoteleft}You share, we take care!{\textquoteright} – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. ; 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022 ; Conference date: 09-05-2022 Through 13-05-2022",

year = "2022",

language = "English",

series = "Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS",

publisher = "International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)",

pages = "723--731",

booktitle = "International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022",

}

Katt, S, Nguyen, H, Oliehoek, FA & Amato, C 2022, BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs. in International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022. Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, vol. 2, International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), pp. 723-731, 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022, Auckland, Virtual, New Zealand, 9/05/22.

BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs. / Katt, Sammie; Nguyen, Hai; Oliehoek, Frans A. et al.
International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), 2022. p. 723-731 (Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS; Vol. 2).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - BADDr

T2 - 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022

AU - Katt, Sammie

AU - Nguyen, Hai

AU - Oliehoek, Frans A.

AU - Amato, Christopher

N1 - Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2022

Y1 - 2022

N2 - While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with varied success. This work presents a representation-agnostic formulation of BRL under partially observability, unifying the previous models under one theoretical umbrella. To demonstrate its practical significance we also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks. Under this parameterization, in contrast to previous work, the belief over the state and dynamics is a more scalable inference problem. We choose actions through Monte-Carlo tree search and empirically show that our method is competitive with state-of-the-art BRL methods on small domains while being able to solve much larger ones.

AB - While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with varied success. This work presents a representation-agnostic formulation of BRL under partially observability, unifying the previous models under one theoretical umbrella. To demonstrate its practical significance we also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks. Under this parameterization, in contrast to previous work, the belief over the state and dynamics is a more scalable inference problem. We choose actions through Monte-Carlo tree search and empirically show that our method is competitive with state-of-the-art BRL methods on small domains while being able to solve much larger ones.

KW - Bayesian RL

KW - MCTS

KW - POMDP

UR - http://www.scopus.com/inward/record.url?scp=85134294668&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85134294668

T3 - Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS

SP - 723

EP - 731

BT - International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022

PB - International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)

Y2 - 9 May 2022 through 13 May 2022

ER -

Katt S, Nguyen H, Oliehoek FA, Amato C. BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs. In International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS). 2022. p. 723-731. (Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS).

BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs

Abstract

Publication series

Conference

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this