Policy derivation methods for critic-only reinforcement learning in continuous action spaces

Eduard Alibekov, Jiri Kubalik, Robert Babuska

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

4 Citations (Scopus)

Abstract

State-of-the-art critic-only reinforcement learning methods can deal with a small discrete action space. The most common approach to real-world problems with continuous actions is to discretize the action space. In this paper a method is proposed to derive a continuous-action policy based on a value function that has been computed for discrete actions by using any known algorithm such as value iteration. Several variants of the policy-derivation algorithm are introduced and compared on two continuous state-action benchmarks: double pendulum swing-up and 3D mountain car.

Original languageEnglish
Title of host publicationIFAC-PapersOnLine
Subtitle of host publicationProceedings of the 4th IFAC Conference on Intelligent Control and Automation Sciences (ICONS 2016)
EditorsK Guelton, B Grabot, Z Lendek
Place of PublicationLaxenburg, Austria
PublisherElsevier
Pages285-290
Volume49 - 5
DOIs
Publication statusPublished - 2016
Event4th IFAC Conference on Intelligent Control and Automation Sciences - Reims, France
Duration: 1 Jun 20163 Jun 2016

Publication series

NameIFAC-PapersOnline
PublisherIFAC-Elsevier
Number5
Volume49
ISSN (Print)2405-8963

Conference

Conference4th IFAC Conference on Intelligent Control and Automation Sciences
Abbreviated titleICONS 2016
CountryFrance
CityReims
Period1/06/163/06/16

Keywords

  • continuous actions
  • multi-variable systems
  • optimal control
  • policy derivation
  • reinforcement learning

Fingerprint Dive into the research topics of 'Policy derivation methods for critic-only reinforcement learning in continuous action spaces'. Together they form a unique fingerprint.

  • Cite this

    Alibekov, E., Kubalik, J., & Babuska, R. (2016). Policy derivation methods for critic-only reinforcement learning in continuous action spaces. In K. Guelton, B. Grabot, & Z. Lendek (Eds.), IFAC-PapersOnLine: Proceedings of the 4th IFAC Conference on Intelligent Control and Automation Sciences (ICONS 2016) (Vol. 49 - 5, pp. 285-290). (IFAC-PapersOnline; Vol. 49, No. 5). Elsevier. https://doi.org/10.1016/j.ifacol.2016.07.127