The Actor-Judge Method: Safe state exploration for Hierarchical Reinforcement Learning Controllers

Stephen Verbist, Tommaso Mannucci, Erik-Jan van Kampen

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

85 Downloads (Pure)


Reinforcement Learning is a much researched topic for autonomous machine behavior and is often applied to navigation problems. In order to deal with growing environments and larger state/action spaces, Hierarchical Reinforcement Learning has been introduced. Unfortunately learning from experience, which is central to Reinforcement Learning, makes guaranteeing safety a complex problem. This paper demonstrates an approach, named the actor-judge approach, to make the exploration safer while imposing as few as possible restrictions on the agent. The approach combines ideas from the
elds of Hierarchical Reinforcement Learning and Safe Reinforcement Learning to develop a Safe Hierarchical Reinforcement Learning algorithm. The algorithm is tested in a simulated environment where the agent represents an Unmanned Aerial Vehicle able to move laterally in four directions using quadridirectional range sensors to establish a relative position. Although this approach does not guarantee the agent to never explore unsafe areas of the state domain, results show the actor-judge method increases agent safety and can be used on multiple levels an HRL agent hierarchy.
Original languageEnglish
Title of host publicationProceedings of the 2018 AIAA Information Systems-AIAA Infotech @ Aerospace
PublisherAmerican Institute of Aeronautics and Astronautics Inc. (AIAA)
Number of pages21
ISBN (Electronic)978-1-62410-527-2
Publication statusPublished - 2018
EventAIAA Information Systems-AIAA Infotech at Aerospace, 2018 - Kissimmee, United States
Duration: 8 Jan 201812 Jan 2018


ConferenceAIAA Information Systems-AIAA Infotech at Aerospace, 2018
CountryUnited States
Internet address


Dive into the research topics of 'The Actor-Judge Method: Safe state exploration for Hierarchical Reinforcement Learning Controllers'. Together they form a unique fingerprint.

Cite this