Policy derivation methods for critic-only reinforcement learning in continuous spaces

Eduard Alibekov; Jiri Kubalik; Robert Babuska

doi:10.1016/j.engappai.2017.12.004

Policy derivation methods for critic-only reinforcement learning in continuous spaces

Eduard Alibekov, Jiri Kubalik, Robert Babuska

Learning & Autonomous Control

Research output: Contribution to journal › Article › Scientific › peer-review

13 Citations (Scopus)

24 Downloads (Pure)

Abstract

This paper addresses the problem of deriving a policy from the value function in the context of critic-only reinforcement learning (RL) in continuous state and action spaces. With continuous-valued states, RL algorithms have to rely on a numerical approximator to represent the value function. Numerical approximation due to its nature virtually always exhibits artifacts which damage the overall performance of the controlled system. In addition, when continuous-valued action is used, the most common approach is to discretize the action space and exhaustively search for the action that maximizes the right-hand side of the Bellman equation. Such a policy derivation procedure is computationally involved and results in steady-state error due to the lack of continuity. In this work, we propose policy derivation methods which alleviate the above problems by means of action space refinement, continuous approximation, and post-processing of the V-function by using symbolic regression. The proposed methods are tested on nonlinear control problems: 1-DOF and 2-DOF pendulum swing-up problems, and on magnetic manipulation. The results show significantly improved performance in terms of cumulative return and computational complexity.

Original language	English
Pages (from-to)	178-187
Journal	Engineering Applications of Artificial Intelligence
Volume	69
DOIs	https://doi.org/10.1016/j.engappai.2017.12.004
Publication status	Published - 2018

Bibliographical note

Accepted Author Manuscript

Keywords

Reinforcement learning
Continuous actions
Multi-variable systems
Optimal control
Policy derivation
Optimization

Access to Document

10.1016/j.engappai.2017.12.004

root_R2Accepted author manuscript, 810 KBLicence: CC BY-NC-ND

Cite this

@article{b79399a4b1314a74bac07e96930c9b1b,

title = "Policy derivation methods for critic-only reinforcement learning in continuous spaces",

abstract = "This paper addresses the problem of deriving a policy from the value function in the context of critic-only reinforcement learning (RL) in continuous state and action spaces. With continuous-valued states, RL algorithms have to rely on a numerical approximator to represent the value function. Numerical approximation due to its nature virtually always exhibits artifacts which damage the overall performance of the controlled system. In addition, when continuous-valued action is used, the most common approach is to discretize the action space and exhaustively search for the action that maximizes the right-hand side of the Bellman equation. Such a policy derivation procedure is computationally involved and results in steady-state error due to the lack of continuity. In this work, we propose policy derivation methods which alleviate the above problems by means of action space refinement, continuous approximation, and post-processing of the V-function by using symbolic regression. The proposed methods are tested on nonlinear control problems: 1-DOF and 2-DOF pendulum swing-up problems, and on magnetic manipulation. The results show significantly improved performance in terms of cumulative return and computational complexity.",

keywords = "Reinforcement learning, Continuous actions, Multi-variable systems, Optimal control, Policy derivation, Optimization",

author = "Eduard Alibekov and Jiri Kubalik and Robert Babuska",

note = "Accepted Author Manuscript",

year = "2018",

doi = "10.1016/j.engappai.2017.12.004",

language = "English",

volume = "69",

pages = "178--187",

journal = "Engineering Applications of Artificial Intelligence",

issn = "0952-1976",

publisher = "Elsevier",

}

TY - JOUR

T1 - Policy derivation methods for critic-only reinforcement learning in continuous spaces

AU - Alibekov, Eduard

AU - Kubalik, Jiri

AU - Babuska, Robert

N1 - Accepted Author Manuscript

PY - 2018

Y1 - 2018

N2 - This paper addresses the problem of deriving a policy from the value function in the context of critic-only reinforcement learning (RL) in continuous state and action spaces. With continuous-valued states, RL algorithms have to rely on a numerical approximator to represent the value function. Numerical approximation due to its nature virtually always exhibits artifacts which damage the overall performance of the controlled system. In addition, when continuous-valued action is used, the most common approach is to discretize the action space and exhaustively search for the action that maximizes the right-hand side of the Bellman equation. Such a policy derivation procedure is computationally involved and results in steady-state error due to the lack of continuity. In this work, we propose policy derivation methods which alleviate the above problems by means of action space refinement, continuous approximation, and post-processing of the V-function by using symbolic regression. The proposed methods are tested on nonlinear control problems: 1-DOF and 2-DOF pendulum swing-up problems, and on magnetic manipulation. The results show significantly improved performance in terms of cumulative return and computational complexity.

AB - This paper addresses the problem of deriving a policy from the value function in the context of critic-only reinforcement learning (RL) in continuous state and action spaces. With continuous-valued states, RL algorithms have to rely on a numerical approximator to represent the value function. Numerical approximation due to its nature virtually always exhibits artifacts which damage the overall performance of the controlled system. In addition, when continuous-valued action is used, the most common approach is to discretize the action space and exhaustively search for the action that maximizes the right-hand side of the Bellman equation. Such a policy derivation procedure is computationally involved and results in steady-state error due to the lack of continuity. In this work, we propose policy derivation methods which alleviate the above problems by means of action space refinement, continuous approximation, and post-processing of the V-function by using symbolic regression. The proposed methods are tested on nonlinear control problems: 1-DOF and 2-DOF pendulum swing-up problems, and on magnetic manipulation. The results show significantly improved performance in terms of cumulative return and computational complexity.

KW - Reinforcement learning

KW - Continuous actions

KW - Multi-variable systems

KW - Optimal control

KW - Policy derivation

KW - Optimization

UR - http://resolver.tudelft.nl/uuid:b79399a4-b131-4a74-bac0-7e96930c9b1b

U2 - 10.1016/j.engappai.2017.12.004

DO - 10.1016/j.engappai.2017.12.004

M3 - Article

SN - 0952-1976

VL - 69

SP - 178

EP - 187

JO - Engineering Applications of Artificial Intelligence

JF - Engineering Applications of Artificial Intelligence

ER -

Policy derivation methods for critic-only reinforcement learning in continuous spaces

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this