The importance of experience replay database composition in deep reinforcement learning

Tim de Bruin; Jens Kober; K.P. Tuyls; Robert Babuska

The importance of experience replay database composition in deep reinforcement learning

Tim de Bruin, Jens Kober, K.P. Tuyls, Robert Babuska

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

Abstract

Recent years have seen a growing interest in the use of deep neural networks as
function approximators in reinforcement learning. This paper investigates the potential of the Deep Deterministic Policy Gradient method for a robot control problem both in simulation and in a real setup. The importance of the size and composition of the experience replay database is investigated and some requirements on the distribution over the state-action space of the experiences in the database are identified. Of particular interest is the importance of negative experiences that are not close to an optimal policy. It is shown how training with samples that are insufficiently spread over the state-action space can cause the method to fail, and how maintaining the distribution over the state-action space of the samples in the experience database can greatly benefit learning.

Original language	English
Title of host publication	Deep Reinforcement Learning Workshop, NIPS 2015
Number of pages	9
Publication status	Published - 2015
Event	NIPS 2015 : 29th Conference on Neural Information Processing Systems - Montreal, Canada Duration: 7 Dec 2015 → 12 Dec 2015

Conference

Conference	NIPS 2015 : 29th Conference on Neural Information Processing Systems
Country/Territory	Canada
City	Montreal
Period	7/12/15 → 12/12/15

Bibliographical note

Deep Reinforcement Learning Workshop (on Friday December 11th).

Access to Document

http://rll.berkeley.edu/deeprlworkshop/papers/database_composition.pdf

Cite this

@inproceedings{af14a67b70be435f9657a304976993fe,

title = "The importance of experience replay database composition in deep reinforcement learning",

abstract = "Recent years have seen a growing interest in the use of deep neural networks asfunction approximators in reinforcement learning. This paper investigates the potential of the Deep Deterministic Policy Gradient method for a robot control problem both in simulation and in a real setup. The importance of the size and composition of the experience replay database is investigated and some requirements on the distribution over the state-action space of the experiences in the database are identified. Of particular interest is the importance of negative experiences that are not close to an optimal policy. It is shown how training with samples that are insufficiently spread over the state-action space can cause the method to fail, and how maintaining the distribution over the state-action space of the samples in the experience database can greatly benefit learning.",

author = "{de Bruin}, Tim and Jens Kober and K.P. Tuyls and Robert Babuska",

note = "Deep Reinforcement Learning Workshop (on Friday December 11th).; NIPS 2015 : 29th Conference on Neural Information Processing Systems ; Conference date: 07-12-2015 Through 12-12-2015",

year = "2015",

language = "English",

booktitle = "Deep Reinforcement Learning Workshop, NIPS 2015",

}

TY - GEN

T1 - The importance of experience replay database composition in deep reinforcement learning

AU - de Bruin, Tim

AU - Kober, Jens

AU - Tuyls, K.P.

AU - Babuska, Robert

N1 - Deep Reinforcement Learning Workshop (on Friday December 11th).

PY - 2015

Y1 - 2015

N2 - Recent years have seen a growing interest in the use of deep neural networks asfunction approximators in reinforcement learning. This paper investigates the potential of the Deep Deterministic Policy Gradient method for a robot control problem both in simulation and in a real setup. The importance of the size and composition of the experience replay database is investigated and some requirements on the distribution over the state-action space of the experiences in the database are identified. Of particular interest is the importance of negative experiences that are not close to an optimal policy. It is shown how training with samples that are insufficiently spread over the state-action space can cause the method to fail, and how maintaining the distribution over the state-action space of the samples in the experience database can greatly benefit learning.

AB - Recent years have seen a growing interest in the use of deep neural networks asfunction approximators in reinforcement learning. This paper investigates the potential of the Deep Deterministic Policy Gradient method for a robot control problem both in simulation and in a real setup. The importance of the size and composition of the experience replay database is investigated and some requirements on the distribution over the state-action space of the experiences in the database are identified. Of particular interest is the importance of negative experiences that are not close to an optimal policy. It is shown how training with samples that are insufficiently spread over the state-action space can cause the method to fail, and how maintaining the distribution over the state-action space of the samples in the experience database can greatly benefit learning.

UR - http://rll.berkeley.edu/deeprlworkshop/

M3 - Conference contribution

BT - Deep Reinforcement Learning Workshop, NIPS 2015

T2 - NIPS 2015 : 29th Conference on Neural Information Processing Systems

Y2 - 7 December 2015 through 12 December 2015

ER -

The importance of experience replay database composition in deep reinforcement learning

Abstract

Conference

Bibliographical note

Access to Document

Other files and links

Fingerprint

Sample effficient deep reinforcement learning for control

Cite this