Policy derivation methods for critic-only reinforcement learning in continuous action spaces

Eduard Alibekov, Jiri Kubalik, Robert Babuska

    Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

    5 Citations (Scopus)

    Abstract

    State-of-the-art critic-only reinforcement learning methods can deal with a small discrete action space. The most common approach to real-world problems with continuous actions is to discretize the action space. In this paper a method is proposed to derive a continuous-action policy based on a value function that has been computed for discrete actions by using any known algorithm such as value iteration. Several variants of the policy-derivation algorithm are introduced and compared on two continuous state-action benchmarks: double pendulum swing-up and 3D mountain car.

    Original languageEnglish
    Title of host publicationIFAC-PapersOnLine
    Subtitle of host publicationProceedings of the 4th IFAC Conference on Intelligent Control and Automation Sciences (ICONS 2016)
    EditorsK Guelton, B Grabot, Z Lendek
    Place of PublicationLaxenburg, Austria
    PublisherElsevier
    Pages285-290
    Volume49 - 5
    DOIs
    Publication statusPublished - 2016
    Event4th IFAC Conference on Intelligent Control and Automation Sciences - Reims, France
    Duration: 1 Jun 20163 Jun 2016

    Publication series

    NameIFAC-PapersOnline
    PublisherIFAC-Elsevier
    Number5
    Volume49
    ISSN (Print)2405-8963

    Conference

    Conference4th IFAC Conference on Intelligent Control and Automation Sciences
    Abbreviated titleICONS 2016
    Country/TerritoryFrance
    CityReims
    Period1/06/163/06/16

    Keywords

    • continuous actions
    • multi-variable systems
    • optimal control
    • policy derivation
    • reinforcement learning

    Fingerprint

    Dive into the research topics of 'Policy derivation methods for critic-only reinforcement learning in continuous action spaces'. Together they form a unique fingerprint.

    Cite this