Optimal control via reinforcement learning with symbolic policy approximation

Jiří Kubalík; Eduard Alibekov; Robert Babuška

doi:10.1016/j.ifacol.2017.08.805

Optimal control via reinforcement learning with symbolic policy approximation

Jiří Kubalík, Eduard Alibekov, Robert Babuška

Learning & Autonomous Control

Research output: Contribution to journal › Conference article › Scientific › peer-review

11 Citations (Scopus)

66 Downloads (Pure)

Abstract

Model-based reinforcement learning (RL) algorithms can be used to derive optimal control laws for nonlinear dynamic systems. With continuous-valued state and input variables, RL algorithms have to rely on function approximators to represent the value function and policy mappings. This paper addresses the problem of finding a smooth policy based on the value function represented by means of a basis-function approximator. We first show that policies derived directly from the value function or represented explicitly by the same type of approximator lead to inferior control performance, manifested by non-smooth control signals and steady-state errors. We then propose a novel method to construct a smooth policy represented by an analytic equation, obtained by means of symbolic regression. The proposed method is illustrated on a reference-tracking problem of a 1-DOF robot arm operating under the influence of gravity. The results show that the analytic control law performs at least equally well as the original numerically approximated policy, while it leads to much smoother control signals. In addition, the analytic function is readable (as opposed to black-box approximators) and can be used in further analysis and synthesis of the closed loop.

Original language	English
Pages (from-to)	4162-4167
Journal	IFAC-PapersOnLine
Volume	50
Issue number	1
DOIs	https://doi.org/10.1016/j.ifacol.2017.08.805
Publication status	Published - 2017
Event	20th World Congress of the International Federation of Automatic Control (IFAC), 2017 - Toulouse, France Duration: 9 Jul 2017 → 14 Jul 2017 Conference number: 20 https://www.ifac2017.org

Keywords

genetic programming
nonlinear model-based control
optimal control
reinforcement learning
symbolic regression
value iteration

Access to Document

10.1016/j.ifacol.2017.08.805

1-s2.0-S2405896317312594-mainFinal published version, 595 KB

Cite this

@article{5679b1a19a8a42e4b8d8012296cea0a0,

title = "Optimal control via reinforcement learning with symbolic policy approximation",

abstract = "Model-based reinforcement learning (RL) algorithms can be used to derive optimal control laws for nonlinear dynamic systems. With continuous-valued state and input variables, RL algorithms have to rely on function approximators to represent the value function and policy mappings. This paper addresses the problem of finding a smooth policy based on the value function represented by means of a basis-function approximator. We first show that policies derived directly from the value function or represented explicitly by the same type of approximator lead to inferior control performance, manifested by non-smooth control signals and steady-state errors. We then propose a novel method to construct a smooth policy represented by an analytic equation, obtained by means of symbolic regression. The proposed method is illustrated on a reference-tracking problem of a 1-DOF robot arm operating under the influence of gravity. The results show that the analytic control law performs at least equally well as the original numerically approximated policy, while it leads to much smoother control signals. In addition, the analytic function is readable (as opposed to black-box approximators) and can be used in further analysis and synthesis of the closed loop.",

keywords = "genetic programming, nonlinear model-based control, optimal control, reinforcement learning, symbolic regression, value iteration",

author = "Ji{\v r}{\'i} Kubal{\'i}k and Eduard Alibekov and Robert Babu{\v s}ka",

year = "2017",

doi = "10.1016/j.ifacol.2017.08.805",

language = "English",

volume = "50",

pages = "4162--4167",

journal = "IFAC-PapersOnLine",

issn = "2405-8963",

publisher = "Elsevier",

number = "1",

note = "20th World Congress of the International Federation of Automatic Control (IFAC), 2017, IFAC 2017 ; Conference date: 09-07-2017 Through 14-07-2017",

url = "https://www.ifac2017.org",

}

TY - JOUR

T1 - Optimal control via reinforcement learning with symbolic policy approximation

AU - Kubalík, Jiří

AU - Alibekov, Eduard

AU - Babuška, Robert

N1 - Conference code: 20

PY - 2017

Y1 - 2017

N2 - Model-based reinforcement learning (RL) algorithms can be used to derive optimal control laws for nonlinear dynamic systems. With continuous-valued state and input variables, RL algorithms have to rely on function approximators to represent the value function and policy mappings. This paper addresses the problem of finding a smooth policy based on the value function represented by means of a basis-function approximator. We first show that policies derived directly from the value function or represented explicitly by the same type of approximator lead to inferior control performance, manifested by non-smooth control signals and steady-state errors. We then propose a novel method to construct a smooth policy represented by an analytic equation, obtained by means of symbolic regression. The proposed method is illustrated on a reference-tracking problem of a 1-DOF robot arm operating under the influence of gravity. The results show that the analytic control law performs at least equally well as the original numerically approximated policy, while it leads to much smoother control signals. In addition, the analytic function is readable (as opposed to black-box approximators) and can be used in further analysis and synthesis of the closed loop.

AB - Model-based reinforcement learning (RL) algorithms can be used to derive optimal control laws for nonlinear dynamic systems. With continuous-valued state and input variables, RL algorithms have to rely on function approximators to represent the value function and policy mappings. This paper addresses the problem of finding a smooth policy based on the value function represented by means of a basis-function approximator. We first show that policies derived directly from the value function or represented explicitly by the same type of approximator lead to inferior control performance, manifested by non-smooth control signals and steady-state errors. We then propose a novel method to construct a smooth policy represented by an analytic equation, obtained by means of symbolic regression. The proposed method is illustrated on a reference-tracking problem of a 1-DOF robot arm operating under the influence of gravity. The results show that the analytic control law performs at least equally well as the original numerically approximated policy, while it leads to much smoother control signals. In addition, the analytic function is readable (as opposed to black-box approximators) and can be used in further analysis and synthesis of the closed loop.

KW - genetic programming

KW - nonlinear model-based control

KW - optimal control

KW - reinforcement learning

KW - symbolic regression

KW - value iteration

UR - http://resolver.tudelft.nl/uuid:5679b1a1-9a8a-42e4-b8d8-012296cea0a0

UR - http://www.scopus.com/inward/record.url?scp=85031776804&partnerID=8YFLogxK

U2 - 10.1016/j.ifacol.2017.08.805

DO - 10.1016/j.ifacol.2017.08.805

M3 - Conference article

AN - SCOPUS:85031776804

SN - 2405-8963

VL - 50

SP - 4162

EP - 4167

JO - IFAC-PapersOnLine

JF - IFAC-PapersOnLine

IS - 1

T2 - 20th World Congress of the International Federation of Automatic Control (IFAC), 2017

Y2 - 9 July 2017 through 14 July 2017

ER -

Optimal control via reinforcement learning with symbolic policy approximation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this