PEBL: Pessimistic Ensembles for Offline Deep Reinforcement Learning

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

20 Downloads (Pure)

Abstract

Offline reinforcement learning (RL), or learning from a fixed data set, is an attractive alternative to online RL. Offline RL promises to address the cost and safety implications of tak- ing numerous random or bad actions online, a crucial aspect of traditional RL that makes it difficult to apply in real-world problems. However, when RL is na ̈ıvely applied to a fixed data set, the resulting policy may exhibit poor performance in the real environment. This happens due to over-estimation of the value of state-action pairs not sufficiently covered by the data set. A promising way to avoid this is by applying pessimism and acting according to a lower bound estimate on the value. It has been shown that penalizing the learned value according to a pessimistic bound on the uncertainty can drastically improve offline RL. In deep reinforcement learn- ing, however, uncertainty estimation is highly non-trivial and development of effective uncertainty-based pessimistic algo- rithms remains an open question. This paper introduces two novel offline deep RL methods built on Double Deep Q- Learning and Soft Actor-Critic. We show how a multi-headed bootstrap approach to uncertainty estimation is used to cal- culate an effective pessimistic value penalty. Our approach is applied to benchmark offline deep RL domains, where we demonstrate that our methods can often beat the current state- of-the-art.
Original languageEnglish
Title of host publicationRobust and Reliable Autonomy in the Wild Workshop at the 30th International Joint Conference of Artificial Intelligence
Number of pages10
Publication statusPublished - 2021
EventRobust and Reliable Autonomy in the Wild Workshop at the 30th International Joint Conference of Artificial Intelligence -
Duration: 19 Aug 202119 Aug 2021

Workshop

WorkshopRobust and Reliable Autonomy in the Wild Workshop at the 30th International Joint Conference of Artificial Intelligence
Abbreviated titleIJCAI 2021 Workshop
Period19/08/2119/08/21

Fingerprint

Dive into the research topics of 'PEBL: Pessimistic Ensembles for Offline Deep Reinforcement Learning'. Together they form a unique fingerprint.

Cite this