Safe Policy Improvement with Baseline Bootstrapping in Factored Environments

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

5 Citations (Scopus)

Abstract

We present a novel safe reinforcement learning algorithm that exploits the factored dynamics of the environment to become less conservative. We focus on problem settings in which a policy is already running and the interaction with the environment is limited. In order to safely deploy an updated policy, it is necessary to provide a confidence level regarding its expected performance. However, algorithms for safe policy improvement might require a large number of past experiences to become confident enough to change the agent’s behavior. Factored reinforcement learning, on the other hand, is known to make good use of the data provided. It can achieve a better sample complexity by exploiting independence between features of the environment, but it lacks a confidence level. We study how to improve the sample efficiency of the safe policy improvement with baseline bootstrapping algorithm by exploiting the factored structure of the environment. Our main result is a theoretical bound that is linear in the number of parameters of the factored representation instead of the number of states. The empirical analysis shows that our method can improve the policy using a number of samples potentially one order of magnitude smaller than the flat algorithm.
Original languageEnglish
Title of host publication33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019
PublisherAmerican Association for Artificial Intelligence (AAAI)
Pages4967-4974
Number of pages8
ISBN (Electronic)9781577358091
DOIs
Publication statusPublished - 2019
EventThe 33th AAAI Conference on Artificial Intelligence - Honolulu, United States
Duration: 27 Jan 20191 Feb 2019
Conference number: 33th

Publication series

Name33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019

Conference

ConferenceThe 33th AAAI Conference on Artificial Intelligence
CountryUnited States
CityHonolulu
Period27/01/191/02/19

Fingerprint

Dive into the research topics of 'Safe Policy Improvement with Baseline Bootstrapping in Factored Environments'. Together they form a unique fingerprint.

Cite this