Symbolic method for deriving policy in reinforcement learning

Eduard Alibekov, Jiřì Kubalìk, Robert Babuska

    Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

    17 Citations (Scopus)
    114 Downloads (Pure)

    Abstract

    This paper addresses the problem of deriving a policy from the value function in the context of reinforcement learning in continuous state and input spaces. We propose a novel method based on genetic programming to construct a symbolic function, which serves as a proxy to the value function and from which a continuous policy is derived. The symbolic proxy function is constructed such that it maximizes the number of correct choices of the control input for a set of selected states. Maximization methods can then be used to derive a control policy that performs better than the policy derived from the original approximate value function. The method was experimentally evaluated on two control problems with continuous spaces, pendulum swing-up and magnetic manipulation, and compared to a standard policy derivation method using the value function approximation. The results show that the proposed method and its variants outperform the standard method.
    Original languageEnglish
    Title of host publicationProceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC)
    EditorsFrancesco Bullo, Christophe Prieur, Alessandro Giua
    Place of PublicationPiscataway, NJ, USA
    PublisherIEEE
    Pages2789-2795
    ISBN (Print)978-1-5090-1837-6
    DOIs
    Publication statusPublished - 2016
    Event55th IEEE Conference on Decision and Control, CDC 2016 - Las Vegas, United States
    Duration: 12 Dec 201614 Dec 2016

    Conference

    Conference55th IEEE Conference on Decision and Control, CDC 2016
    Abbreviated titleCDC 2016
    Country/TerritoryUnited States
    CityLas Vegas
    Period12/12/1614/12/16

    Bibliographical note

    Accepted Author Manuscript

    Keywords

    • Genetic programming
    • Sociology
    • Statistics
    • Learning (artificial intelligence)
    • Standards
    • Cybernetics
    • Trajectory

    Fingerprint

    Dive into the research topics of 'Symbolic method for deriving policy in reinforcement learning'. Together they form a unique fingerprint.

    Cite this