Efficient exploration with Double Uncertain Value Networks

Thomas Moerland; Joost Broekens; Catholijn Jonker

Efficient exploration with Double Uncertain Value Networks

Thomas Moerland, Joost Broekens, Catholijn Jonker

Interactive Intelligence

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific

61 Downloads (Pure)

Abstract

This paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action. We identify two sources of uncertainty that are relevant for exploration. The first originates from limited data (parametric uncertainty), while the second originates from the distribution of the returns (return uncertainty). We identify methods to learn these distributions with deep neural networks, where we estimate parametric uncertainty with Bayesian drop-out, while return uncertainty is propagated through the Bellman equation as a Gaussian distribution. Then, we identify that both can be jointly estimated in one network, which we call the Double Uncertain Value Network. The policy is directly derived from the learned distributions based on Thompson sampling. Experimental results show that both types of uncertainty may vastly improve learning in domains with a strong exploration challenge.

Original language	English
Title of host publication	Deep Reinforcement Learning Symposium, NIPS 2017
Pages	1-17
Number of pages	17
Publication status	Published - 2017
Event	NIPS 2017: Thirty-first Conference on Neural Information Processing Systems - Long Beach, United States Duration: 7 Dec 2017 → 7 Dec 2017 Conference number: 31th

Conference

Conference	NIPS 2017
Country/Territory	United States
City	Long Beach
Period	7/12/17 → 7/12/17

Access to Document

MoerlandBroekensJonker_EfficientExplorationwithDoubleUncertainValueNetworks_NIPS - Thomas Moerland (1)Final published version, 1.79 MB

https://sites.google.com/view/deeprl-symposium-nips2017/home

Cite this

@inproceedings{615d6642d3754f61b1aa6d69c9160bbb,

title = "Efficient exploration with Double Uncertain Value Networks",

abstract = "This paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action. We identify two sources of uncertainty that are relevant for exploration. The first originates from limited data (parametric uncertainty), while the second originates from the distribution of the returns (return uncertainty). We identify methods to learn these distributions with deep neural networks, where we estimate parametric uncertainty with Bayesian drop-out, while return uncertainty is propagated through the Bellman equation as a Gaussian distribution. Then, we identify that both can be jointly estimated in one network, which we call the Double Uncertain Value Network. The policy is directly derived from the learned distributions based on Thompson sampling. Experimental results show that both types of uncertainty may vastly improve learning in domains with a strong exploration challenge.",

author = "Thomas Moerland and Joost Broekens and Catholijn Jonker",

year = "2017",

language = "English",

pages = "1--17",

booktitle = "Deep Reinforcement Learning Symposium, NIPS 2017",

note = "NIPS 2017 : Thirty-first Conference on Neural Information Processing Systems ; Conference date: 07-12-2017 Through 07-12-2017",

}

TY - GEN

T1 - Efficient exploration with Double Uncertain Value Networks

AU - Moerland, Thomas

AU - Broekens, Joost

AU - Jonker, Catholijn

N1 - Conference code: 31th

PY - 2017

Y1 - 2017

N2 - This paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action. We identify two sources of uncertainty that are relevant for exploration. The first originates from limited data (parametric uncertainty), while the second originates from the distribution of the returns (return uncertainty). We identify methods to learn these distributions with deep neural networks, where we estimate parametric uncertainty with Bayesian drop-out, while return uncertainty is propagated through the Bellman equation as a Gaussian distribution. Then, we identify that both can be jointly estimated in one network, which we call the Double Uncertain Value Network. The policy is directly derived from the learned distributions based on Thompson sampling. Experimental results show that both types of uncertainty may vastly improve learning in domains with a strong exploration challenge.

AB - This paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action. We identify two sources of uncertainty that are relevant for exploration. The first originates from limited data (parametric uncertainty), while the second originates from the distribution of the returns (return uncertainty). We identify methods to learn these distributions with deep neural networks, where we estimate parametric uncertainty with Bayesian drop-out, while return uncertainty is propagated through the Bellman equation as a Gaussian distribution. Then, we identify that both can be jointly estimated in one network, which we call the Double Uncertain Value Network. The policy is directly derived from the learned distributions based on Thompson sampling. Experimental results show that both types of uncertainty may vastly improve learning in domains with a strong exploration challenge.

M3 - Conference contribution

SP - 1

EP - 17

BT - Deep Reinforcement Learning Symposium, NIPS 2017

T2 - NIPS 2017

Y2 - 7 December 2017 through 7 December 2017

ER -

Efficient exploration with Double Uncertain Value Networks

Abstract

Conference

Access to Document

Fingerprint

Cite this