Abstraction-Guided Policy Recovery from Expert Demonstrations

C.T. Ponnambalam, F.A. Oliehoek, M.T.J. Spaan

Research output: Contribution to conferencePaperpeer-review

30 Downloads (Pure)

Abstract

The goal in behavior cloning is to extract meaningful information from expertdemonstrations and reproduce the same behavior autonomously. However, theavailable data is unlikely to exhaustively cover the potential problem space. As aresult, the quality of automated decision making is compromised without elegantways to handle the encountering of out-of-distribution states that might occur dueto unforeseen events in the environment. Our novel approach RECO uses only theoffline data available to recover a behavioral cloning agent from unknown states.Given expert trajectories, RECO learns both an imitation policy and recoverypolicy. Our contribution is a method for learning this recovery policy that steersthe agent back to the trajectories in the data set from unknown states. Whilethere is, per definition, no data available to learn the recovery policy, we exploitabstractions to generalize beyond the available data thus overcoming this problem.In a tabular domain, we show how our method results in drastically fewer calls to ahuman supervisor without compromising solution quality and with few trajectoriesprovided by an expert. We further introduce a continuous adaptation of RECO andevaluate its potential in an experiment.
Original languageEnglish
Number of pages10
Publication statusPublished - 2020
EventOffline Reinforcement Learning Workshop at Neural Information Processing Systems, 2020 -
Duration: 12 Dec 202012 Dec 2020
https://offline-rl-neurips.github.io/papers.html

Workshop

WorkshopOffline Reinforcement Learning Workshop at Neural Information Processing Systems, 2020
Period12/12/2012/12/20
OtherVirtual/online event due to COVID-19
Internet address

Fingerprint

Dive into the research topics of 'Abstraction-Guided Policy Recovery from Expert Demonstrations'. Together they form a unique fingerprint.

Cite this