Symbolic method for deriving policy in reinforcement learning

Eduard Alibekov; Jiřì Kubalìk; Robert Babuska

doi:10.1109/CDC.2016.7798684

Symbolic method for deriving policy in reinforcement learning

Eduard Alibekov, Jiřì Kubalìk, Robert Babuska

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

17 Citations (Scopus)

114 Downloads (Pure)

Abstract

This paper addresses the problem of deriving a policy from the value function in the context of reinforcement learning in continuous state and input spaces. We propose a novel method based on genetic programming to construct a symbolic function, which serves as a proxy to the value function and from which a continuous policy is derived. The symbolic proxy function is constructed such that it maximizes the number of correct choices of the control input for a set of selected states. Maximization methods can then be used to derive a control policy that performs better than the policy derived from the original approximate value function. The method was experimentally evaluated on two control problems with continuous spaces, pendulum swing-up and magnetic manipulation, and compared to a standard policy derivation method using the value function approximation. The results show that the proposed method and its variants outperform the standard method.

Original language	English
Title of host publication	Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC)
Editors	Francesco Bullo, Christophe Prieur, Alessandro Giua
Place of Publication	Piscataway, NJ, USA
Publisher	IEEE
Pages	2789-2795
ISBN (Print)	978-1-5090-1837-6
DOIs	https://doi.org/10.1109/CDC.2016.7798684
Publication status	Published - 2016
Event	55th IEEE Conference on Decision and Control, CDC 2016 - Las Vegas, United States Duration: 12 Dec 2016 → 14 Dec 2016

Conference

Conference	55th IEEE Conference on Decision and Control, CDC 2016
Abbreviated title	CDC 2016
Country/Territory	United States
City	Las Vegas
Period	12/12/16 → 14/12/16

Bibliographical note

Accepted Author Manuscript

Keywords

Genetic programming
Sociology
Statistics
Learning (artificial intelligence)
Standards
Cybernetics
Trajectory

Access to Document

10.1109/CDC.2016.7798684

Symbolic_Method_for_Deriving_Policy_in_Reinforcement_Learning_author_versionAccepted author manuscript, 757 KB

Cite this

@inproceedings{086f4ffd09e94033a6f35c7358705052,

title = "Symbolic method for deriving policy in reinforcement learning",

abstract = "This paper addresses the problem of deriving a policy from the value function in the context of reinforcement learning in continuous state and input spaces. We propose a novel method based on genetic programming to construct a symbolic function, which serves as a proxy to the value function and from which a continuous policy is derived. The symbolic proxy function is constructed such that it maximizes the number of correct choices of the control input for a set of selected states. Maximization methods can then be used to derive a control policy that performs better than the policy derived from the original approximate value function. The method was experimentally evaluated on two control problems with continuous spaces, pendulum swing-up and magnetic manipulation, and compared to a standard policy derivation method using the value function approximation. The results show that the proposed method and its variants outperform the standard method.",

keywords = "Genetic programming, Sociology, Statistics, Learning (artificial intelligence), Standards, Cybernetics, Trajectory",

author = "Eduard Alibekov and Ji{\v r}{\`i} Kubal{\`i}k and Robert Babuska",

note = "Accepted Author Manuscript; 55th IEEE Conference on Decision and Control, CDC 2016, CDC 2016 ; Conference date: 12-12-2016 Through 14-12-2016",

year = "2016",

doi = "10.1109/CDC.2016.7798684",

language = "English",

isbn = "978-1-5090-1837-6",

pages = "2789--2795",

editor = "Bullo, {Francesco } and Prieur, {Christophe } and Alessandro Giua",

booktitle = "Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC)",

publisher = "IEEE",

address = "United States",

}

Alibekov, E, Kubalìk, J & Babuska, R 2016, Symbolic method for deriving policy in reinforcement learning. in F Bullo, C Prieur & A Giua (eds), Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC) . IEEE, Piscataway, NJ, USA, pp. 2789-2795, 55th IEEE Conference on Decision and Control, CDC 2016, Las Vegas, United States, 12/12/16. https://doi.org/10.1109/CDC.2016.7798684

Symbolic method for deriving policy in reinforcement learning. / Alibekov, Eduard; Kubalìk, Jiřì; Babuska, Robert.
Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC) . ed. / Francesco Bullo; Christophe Prieur; Alessandro Giua. Piscataway, NJ, USA: IEEE, 2016. p. 2789-2795.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Symbolic method for deriving policy in reinforcement learning

AU - Alibekov, Eduard

AU - Kubalìk, Jiřì

AU - Babuska, Robert

N1 - Accepted Author Manuscript

PY - 2016

Y1 - 2016

N2 - This paper addresses the problem of deriving a policy from the value function in the context of reinforcement learning in continuous state and input spaces. We propose a novel method based on genetic programming to construct a symbolic function, which serves as a proxy to the value function and from which a continuous policy is derived. The symbolic proxy function is constructed such that it maximizes the number of correct choices of the control input for a set of selected states. Maximization methods can then be used to derive a control policy that performs better than the policy derived from the original approximate value function. The method was experimentally evaluated on two control problems with continuous spaces, pendulum swing-up and magnetic manipulation, and compared to a standard policy derivation method using the value function approximation. The results show that the proposed method and its variants outperform the standard method.

AB - This paper addresses the problem of deriving a policy from the value function in the context of reinforcement learning in continuous state and input spaces. We propose a novel method based on genetic programming to construct a symbolic function, which serves as a proxy to the value function and from which a continuous policy is derived. The symbolic proxy function is constructed such that it maximizes the number of correct choices of the control input for a set of selected states. Maximization methods can then be used to derive a control policy that performs better than the policy derived from the original approximate value function. The method was experimentally evaluated on two control problems with continuous spaces, pendulum swing-up and magnetic manipulation, and compared to a standard policy derivation method using the value function approximation. The results show that the proposed method and its variants outperform the standard method.

KW - Genetic programming

KW - Sociology

KW - Statistics

KW - Learning (artificial intelligence)

KW - Standards

KW - Cybernetics

KW - Trajectory

UR - http://resolver.tudelft.nl/uuid:086f4ffd-09e9-4033-a6f3-5c7358705052

U2 - 10.1109/CDC.2016.7798684

DO - 10.1109/CDC.2016.7798684

M3 - Conference contribution

SN - 978-1-5090-1837-6

SP - 2789

EP - 2795

BT - Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC)

A2 - Bullo, Francesco

A2 - Prieur, Christophe

A2 - Giua, Alessandro

PB - IEEE

CY - Piscataway, NJ, USA

T2 - 55th IEEE Conference on Decision and Control, CDC 2016

Y2 - 12 December 2016 through 14 December 2016

ER -

Symbolic method for deriving policy in reinforcement learning

Abstract

Conference

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this