Abstract
This paper addresses the problem of deriving a policy from the value function in the context of reinforcement learning in continuous state and input spaces. We propose a novel method based on genetic programming to construct a symbolic function, which serves as a proxy to the value function and from which a continuous policy is derived. The symbolic proxy function is constructed such that it maximizes the number of correct choices of the control input for a set of selected states. Maximization methods can then be used to derive a control policy that performs better than the policy derived from the original approximate value function. The method was experimentally evaluated on two control problems with continuous spaces, pendulum swing-up and magnetic manipulation, and compared to a standard policy derivation method using the value function approximation. The results show that the proposed method and its variants outperform the standard method.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC) |
Editors | Francesco Bullo, Christophe Prieur, Alessandro Giua |
Place of Publication | Piscataway, NJ, USA |
Publisher | IEEE |
Pages | 2789-2795 |
ISBN (Print) | 978-1-5090-1837-6 |
DOIs | |
Publication status | Published - 2016 |
Event | 55th IEEE Conference on Decision and Control, CDC 2016 - Las Vegas, United States Duration: 12 Dec 2016 → 14 Dec 2016 |
Conference
Conference | 55th IEEE Conference on Decision and Control, CDC 2016 |
---|---|
Abbreviated title | CDC 2016 |
Country/Territory | United States |
City | Las Vegas |
Period | 12/12/16 → 14/12/16 |
Bibliographical note
Accepted Author ManuscriptKeywords
- Genetic programming
- Sociology
- Statistics
- Learning (artificial intelligence)
- Standards
- Cybernetics
- Trajectory