Abstraction-Guided Policy Recovery from Expert Demonstrations

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

5 Downloads (Pure)

Abstract

How can we plan efficiently in real time to control an agent in a complex environment that may involve many other agents? While existing sample-based planners have enjoyed empirical success in large POMDPs, their performance heavily relies on a fast simulator. However, real-world scenarios are complex in nature and their simulators are often computationally demanding, which severely limits the performance of online planners. In this work, we propose influence-augmented online planning, a principled method to transform a factored simulator of the entire environment into a local simulator that samples only the state variables that are most relevant to the observation and reward of the planning agent and captures the incoming influence from the rest of the environment using machine learning methods. Our main experimental results show that planning on this less accurate but much faster local simulator with POMCP leads to higher real-time planning performance than planning on the simulator that models the entire environment.
Original languageEnglish
Title of host publication31th International Conference on Automated Planning and Scheduling
PublisherAmerican Association for Artificial Intelligence (AAAI)
Pages560-568
Number of pages9
Publication statusPublished - 2021
Event31st International Conference on Automated Planning and Scheduling - Virtual/online event
Duration: 7 Jun 202112 Jun 2021
Conference number: 31

Conference

Conference31st International Conference on Automated Planning and Scheduling
Abbreviated titleICAPS 2021
Period7/06/2112/06/21

Fingerprint

Dive into the research topics of 'Abstraction-Guided Policy Recovery from Expert Demonstrations'. Together they form a unique fingerprint.

Cite this