Proxy functions for Approximate Reinforcement Learning

Eduard Alibekov; Jiří Kubalík; Robert Babuška

doi:10.1016/j.ifacol.2019.09.145

Proxy functions for Approximate Reinforcement Learning

Eduard Alibekov, Jiří Kubalík, Robert Babuška

Learning & Autonomous Control

Research output: Contribution to journal › Conference article › Scientific › peer-review

87 Downloads (Pure)

Abstract

Approximate Reinforcement Learning (RL) is a method to solve sequential decisionmaking and dynamic control problems in an optimal way. This paper addresses RL for continuous state spaces which derive the control policy by using an approximate value function (V-function). The standard approach to derive a policy through the V-function is analogous to hill climbing: at each state the RL agent chooses the control input that maximizes the right-hand side of the Bellman equation. Although theoretically optimal, the actual control performance of this method is heavily influenced by the local smoothness of the V-function; a lack of smoothness results in undesired closed-loop behavior with input chattering or limit-cycles. To circumvent these problems, this paper provides a method based on Symbolic Regression to generate a locally smooth proxy to the V-function. The proposed method has been evaluated on two nonlinear control benchmarks: pendulum swing-up and magnetic manipulation. The new method has been compared with the standard policy derivation technique using the approximate V-function and the results show that the proposed approach outperforms the standard one with respect to the cumulative return.

Original language	English
Pages (from-to)	224-229
Journal	IFAC-PapersOnLine
Volume	52
Issue number	11
DOIs	https://doi.org/10.1016/j.ifacol.2019.09.145
Publication status	Published - 2019
Event	5th IFAC Conference on Intelligent Control and Automation Sciences, ICONS 2019 - Belfast, United Kingdom Duration: 21 Aug 2019 → 23 Aug 2019

Keywords

continuous state space
optimal control
policy derivation
reinforcement learning
V-function

Access to Document

10.1016/j.ifacol.2019.09.145

1-s2.0-S240589631930775X-mainFinal published version, 853 KB

Cite this

@article{a1fac80d87404622876159db38b4ea18,

title = "Proxy functions for Approximate Reinforcement Learning",

abstract = "Approximate Reinforcement Learning (RL) is a method to solve sequential decisionmaking and dynamic control problems in an optimal way. This paper addresses RL for continuous state spaces which derive the control policy by using an approximate value function (V-function). The standard approach to derive a policy through the V-function is analogous to hill climbing: at each state the RL agent chooses the control input that maximizes the right-hand side of the Bellman equation. Although theoretically optimal, the actual control performance of this method is heavily influenced by the local smoothness of the V-function; a lack of smoothness results in undesired closed-loop behavior with input chattering or limit-cycles. To circumvent these problems, this paper provides a method based on Symbolic Regression to generate a locally smooth proxy to the V-function. The proposed method has been evaluated on two nonlinear control benchmarks: pendulum swing-up and magnetic manipulation. The new method has been compared with the standard policy derivation technique using the approximate V-function and the results show that the proposed approach outperforms the standard one with respect to the cumulative return.",

keywords = "continuous state space, optimal control, policy derivation, reinforcement learning, V-function",

author = "Eduard Alibekov and Ji{\v r}{\'i} Kubal{\'i}k and Robert Babu{\v s}ka",

year = "2019",

doi = "10.1016/j.ifacol.2019.09.145",

language = "English",

volume = "52",

pages = "224--229",

journal = "IFAC-PapersOnLine",

issn = "1474-6670",

publisher = "Elsevier",

number = "11",

note = "5th IFAC Conference on Intelligent Control and Automation Sciences, ICONS 2019 ; Conference date: 21-08-2019 Through 23-08-2019",

}

TY - JOUR

T1 - Proxy functions for Approximate Reinforcement Learning

AU - Alibekov, Eduard

AU - Kubalík, Jiří

AU - Babuška, Robert

PY - 2019

Y1 - 2019

N2 - Approximate Reinforcement Learning (RL) is a method to solve sequential decisionmaking and dynamic control problems in an optimal way. This paper addresses RL for continuous state spaces which derive the control policy by using an approximate value function (V-function). The standard approach to derive a policy through the V-function is analogous to hill climbing: at each state the RL agent chooses the control input that maximizes the right-hand side of the Bellman equation. Although theoretically optimal, the actual control performance of this method is heavily influenced by the local smoothness of the V-function; a lack of smoothness results in undesired closed-loop behavior with input chattering or limit-cycles. To circumvent these problems, this paper provides a method based on Symbolic Regression to generate a locally smooth proxy to the V-function. The proposed method has been evaluated on two nonlinear control benchmarks: pendulum swing-up and magnetic manipulation. The new method has been compared with the standard policy derivation technique using the approximate V-function and the results show that the proposed approach outperforms the standard one with respect to the cumulative return.

AB - Approximate Reinforcement Learning (RL) is a method to solve sequential decisionmaking and dynamic control problems in an optimal way. This paper addresses RL for continuous state spaces which derive the control policy by using an approximate value function (V-function). The standard approach to derive a policy through the V-function is analogous to hill climbing: at each state the RL agent chooses the control input that maximizes the right-hand side of the Bellman equation. Although theoretically optimal, the actual control performance of this method is heavily influenced by the local smoothness of the V-function; a lack of smoothness results in undesired closed-loop behavior with input chattering or limit-cycles. To circumvent these problems, this paper provides a method based on Symbolic Regression to generate a locally smooth proxy to the V-function. The proposed method has been evaluated on two nonlinear control benchmarks: pendulum swing-up and magnetic manipulation. The new method has been compared with the standard policy derivation technique using the approximate V-function and the results show that the proposed approach outperforms the standard one with respect to the cumulative return.

KW - continuous state space

KW - optimal control

KW - policy derivation

KW - reinforcement learning

KW - V-function

UR - http://www.scopus.com/inward/record.url?scp=85076257580&partnerID=8YFLogxK

U2 - 10.1016/j.ifacol.2019.09.145

DO - 10.1016/j.ifacol.2019.09.145

M3 - Conference article

AN - SCOPUS:85076257580

SN - 1474-6670

VL - 52

SP - 224

EP - 229

JO - IFAC-PapersOnLine

JF - IFAC-PapersOnLine

IS - 11

T2 - 5th IFAC Conference on Intelligent Control and Automation Sciences, ICONS 2019

Y2 - 21 August 2019 through 23 August 2019

ER -

Proxy functions for Approximate Reinforcement Learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this