Learning state representation for deep actor-critic control

J. Munk; Jens Kober; Robert Babuska

doi:10.1109/CDC.2016.7798980

Learning state representation for deep actor-critic control

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

36 Citations (Scopus)

318 Downloads (Pure)

Abstract

Deep Neural Networks (DNNs) can be used as function approximators in Reinforcement Learning (RL). One advantage of DNNs is that they can cope with large input dimensions. Instead of relying on feature engineering to lower the input dimension, DNNs can extract the features from raw observations. The drawback of this end-to-end learning is that it usually requires a large amount of data, which for real-world control applications is not always available. In this paper, a new algorithm, Model Learning Deep Deterministic Policy Gradient (ML-DDPG), is proposed that combines RL with state representation learning, i.e., learning a mapping from an input vector to a state before solving the RL task. The ML-DDPG algorithm uses a concept we call predictive priors to learn a model network which is subsequently used to pre-train the first layer of the actor and critic networks. Simulation results show that the ML-DDPG can learn reasonable continuous control policies from high-dimensional observations that contain also task-irrelevant information. Furthermore, in some cases, this approach significantly improves the final performance in comparison to end-to-end learning.

Original language	English
Title of host publication	Proceedings 2016 IEEE 55th Conference on Decision and Control (CDC)
Editors	Francesco Bullo, Christophe Prieur, Alessandro Giua
Place of Publication	Piscataway, NJ, USA
Publisher	IEEE
Pages	4667-4673
ISBN (Print)	978-1-5090-1837-6
DOIs	https://doi.org/10.1109/CDC.2016.7798980
Publication status	Published - 2016
Event	55th IEEE Conference on Decision and Control, CDC 2016 - Las Vegas, United States Duration: 12 Dec 2016 → 14 Dec 2016

Conference

Conference	55th IEEE Conference on Decision and Control, CDC 2016
Abbreviated title	CDC 2016
Country/Territory	United States
City	Las Vegas
Period	12/12/16 → 14/12/16

Bibliographical note

Accepted Author Manuscript

Keywords

Approximation algorithms
Robot sensing systems
Algorithm design and analysis
Prediction algorithms
Learning (artificial intelligence)
Feature extraction

Access to Document

10.1109/CDC.2016.7798980

Jelle_Munk_CDC2016_author_versionAccepted author manuscript, 357 KB

Cite this

@inproceedings{1830de68f008471f898f0665c2a907d2,

title = "Learning state representation for deep actor-critic control",

abstract = "Deep Neural Networks (DNNs) can be used as function approximators in Reinforcement Learning (RL). One advantage of DNNs is that they can cope with large input dimensions. Instead of relying on feature engineering to lower the input dimension, DNNs can extract the features from raw observations. The drawback of this end-to-end learning is that it usually requires a large amount of data, which for real-world control applications is not always available. In this paper, a new algorithm, Model Learning Deep Deterministic Policy Gradient (ML-DDPG), is proposed that combines RL with state representation learning, i.e., learning a mapping from an input vector to a state before solving the RL task. The ML-DDPG algorithm uses a concept we call predictive priors to learn a model network which is subsequently used to pre-train the first layer of the actor and critic networks. Simulation results show that the ML-DDPG can learn reasonable continuous control policies from high-dimensional observations that contain also task-irrelevant information. Furthermore, in some cases, this approach significantly improves the final performance in comparison to end-to-end learning.",

keywords = "Approximation algorithms, Robot sensing systems, Algorithm design and analysis, Prediction algorithms, Learning (artificial intelligence), Feature extraction",

author = "J. Munk and Jens Kober and Robert Babuska",

note = "Accepted Author Manuscript; 55th IEEE Conference on Decision and Control, CDC 2016, CDC 2016 ; Conference date: 12-12-2016 Through 14-12-2016",

year = "2016",

doi = "10.1109/CDC.2016.7798980",

language = "English",

isbn = "978-1-5090-1837-6",

pages = "4667--4673",

editor = "Bullo, {Francesco } and Prieur, {Christophe } and Alessandro Giua",

booktitle = "Proceedings 2016 IEEE 55th Conference on Decision and Control (CDC)",

publisher = "IEEE",

address = "United States",

}

Munk, J, Kober, J & Babuska, R 2016, Learning state representation for deep actor-critic control. in F Bullo, C Prieur & A Giua (eds), Proceedings 2016 IEEE 55th Conference on Decision and Control (CDC) . IEEE, Piscataway, NJ, USA, pp. 4667-4673, 55th IEEE Conference on Decision and Control, CDC 2016, Las Vegas, United States, 12/12/16. https://doi.org/10.1109/CDC.2016.7798980

Learning state representation for deep actor-critic control. / Munk, J.; Kober, Jens ; Babuska, Robert.
Proceedings 2016 IEEE 55th Conference on Decision and Control (CDC) . ed. / Francesco Bullo; Christophe Prieur; Alessandro Giua. Piscataway, NJ, USA: IEEE, 2016. p. 4667-4673.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Learning state representation for deep actor-critic control

AU - Munk, J.

AU - Kober, Jens

AU - Babuska, Robert

N1 - Accepted Author Manuscript

PY - 2016

Y1 - 2016

N2 - Deep Neural Networks (DNNs) can be used as function approximators in Reinforcement Learning (RL). One advantage of DNNs is that they can cope with large input dimensions. Instead of relying on feature engineering to lower the input dimension, DNNs can extract the features from raw observations. The drawback of this end-to-end learning is that it usually requires a large amount of data, which for real-world control applications is not always available. In this paper, a new algorithm, Model Learning Deep Deterministic Policy Gradient (ML-DDPG), is proposed that combines RL with state representation learning, i.e., learning a mapping from an input vector to a state before solving the RL task. The ML-DDPG algorithm uses a concept we call predictive priors to learn a model network which is subsequently used to pre-train the first layer of the actor and critic networks. Simulation results show that the ML-DDPG can learn reasonable continuous control policies from high-dimensional observations that contain also task-irrelevant information. Furthermore, in some cases, this approach significantly improves the final performance in comparison to end-to-end learning.

AB - Deep Neural Networks (DNNs) can be used as function approximators in Reinforcement Learning (RL). One advantage of DNNs is that they can cope with large input dimensions. Instead of relying on feature engineering to lower the input dimension, DNNs can extract the features from raw observations. The drawback of this end-to-end learning is that it usually requires a large amount of data, which for real-world control applications is not always available. In this paper, a new algorithm, Model Learning Deep Deterministic Policy Gradient (ML-DDPG), is proposed that combines RL with state representation learning, i.e., learning a mapping from an input vector to a state before solving the RL task. The ML-DDPG algorithm uses a concept we call predictive priors to learn a model network which is subsequently used to pre-train the first layer of the actor and critic networks. Simulation results show that the ML-DDPG can learn reasonable continuous control policies from high-dimensional observations that contain also task-irrelevant information. Furthermore, in some cases, this approach significantly improves the final performance in comparison to end-to-end learning.

KW - Approximation algorithms

KW - Robot sensing systems

KW - Algorithm design and analysis

KW - Prediction algorithms

KW - Learning (artificial intelligence)

KW - Feature extraction

UR - http://resolver.tudelft.nl/uuid:1830de68-f008-471f-898f-0665c2a907d2

U2 - 10.1109/CDC.2016.7798980

DO - 10.1109/CDC.2016.7798980

M3 - Conference contribution

SN - 978-1-5090-1837-6

SP - 4667

EP - 4673

BT - Proceedings 2016 IEEE 55th Conference on Decision and Control (CDC)

A2 - Bullo, Francesco

A2 - Prieur, Christophe

A2 - Giua, Alessandro

PB - IEEE

CY - Piscataway, NJ, USA

T2 - 55th IEEE Conference on Decision and Control, CDC 2016

Y2 - 12 December 2016 through 14 December 2016

ER -

Learning state representation for deep actor-critic control

Abstract

Conference

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this