Symbolic method for deriving policy in reinforcement learning

Eduard Alibekov, Jiřì Kubalìk, Robert Babuska

    Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

    17 Citations (Scopus)
    98 Downloads (Pure)


    This paper addresses the problem of deriving a policy from the value function in the context of reinforcement learning in continuous state and input spaces. We propose a novel method based on genetic programming to construct a symbolic function, which serves as a proxy to the value function and from which a continuous policy is derived. The symbolic proxy function is constructed such that it maximizes the number of correct choices of the control input for a set of selected states. Maximization methods can then be used to derive a control policy that performs better than the policy derived from the original approximate value function. The method was experimentally evaluated on two control problems with continuous spaces, pendulum swing-up and magnetic manipulation, and compared to a standard policy derivation method using the value function approximation. The results show that the proposed method and its variants outperform the standard method.
    Original languageEnglish
    Title of host publicationProceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC)
    EditorsFrancesco Bullo, Christophe Prieur, Alessandro Giua
    Place of PublicationPiscataway, NJ, USA
    ISBN (Print)978-1-5090-1837-6
    Publication statusPublished - 2016
    Event55th IEEE Conference on Decision and Control, CDC 2016 - Las Vegas, United States
    Duration: 12 Dec 201614 Dec 2016


    Conference55th IEEE Conference on Decision and Control, CDC 2016
    Abbreviated titleCDC 2016
    Country/TerritoryUnited States
    CityLas Vegas

    Bibliographical note

    Accepted Author Manuscript


    • Genetic programming
    • Sociology
    • Statistics
    • Learning (artificial intelligence)
    • Standards
    • Cybernetics
    • Trajectory


    Dive into the research topics of 'Symbolic method for deriving policy in reinforcement learning'. Together they form a unique fingerprint.

    Cite this