Flexible Heuristic Dynamic Programming for Reinforcement Learning in Quadrotors

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

2 Citations (Scopus)
21 Downloads (Pure)


Reinforcement learning is a paradigm for learning decision-making tasks from interaction with the environment. Function approximators solve a part of the curse of dimensionality when learning in high-dimensional state and/or action spaces. It can be a time-consuming process to learn a good policy in a high dimensional state space directly. A method is proposed for initially limiting the state and action space to a subset of the variables of the Markov Decision Process. Therefore, the agent will initially learn a coarse policy. It is then gradually exposed to new state and action variables to increase the dimensionality of the state and action space to the ones posed by the control problem. A local function approximator has been developed that supports the expansion of state and action space. The concept is applied to the Model-Learning Actor-Critic, a model-based Heuristic Dy- namic Programming algorithm. Its functioning is demonstrated by training a reinforcement learning agent for 2-dimensional hover control of a Parrot AR 2.0 quad-rotor. It is shown that the agent is able to learn faster and to achieve a better policy when being exposed to the action and state variables gradually than all at once from the start
Original languageEnglish
Title of host publicationProceedings of the 2018 AIAA Information Systems-AIAA Infotech @ Aerospace
PublisherAmerican Institute of Aeronautics and Astronautics Inc. (AIAA)
Number of pages19
ISBN (Electronic)978-1-62410-527-2
Publication statusPublished - 2018
EventAIAA Information Systems-AIAA Infotech at Aerospace, 2018 - Kissimmee, United States
Duration: 8 Jan 201812 Jan 2018


ConferenceAIAA Information Systems-AIAA Infotech at Aerospace, 2018
CountryUnited States
Internet address


Dive into the research topics of 'Flexible Heuristic Dynamic Programming for Reinforcement Learning in Quadrotors'. Together they form a unique fingerprint.

Cite this