Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

2 Citations (Scopus)
8 Downloads (Pure)

Abstract

Centaurs are half-human, half-AI decision-makers where the AI's goal is to complement the human. To do so, the AI must be able to recognize the goals and constraints of the human and have the means to help them. We present a novel formulation of the interaction between the human and the AI as a sequential game where the agents are modelled using Bayesian best-response models. We show that in this case the AI's problem of helping bounded-rational humans make better decisions reduces to a Bayes-adaptive POMDP. In our simulated experiments, we consider an instantiation of our framework for humans who are subjectively optimistic about the AI's future behaviour. Our results show that when equipped with a model of the human, the AI can infer the human's bounds and nudge them towards better decisions. We discuss ways in which the machine can learn to improve upon its own limitations as well with the help of the human. We identify a novel trade-off for centaurs in partially observable tasks: for the AI's actions to be acceptable to the human, the machine must make sure their beliefs are sufficiently aligned, but aligning beliefs might be costly. We present a preliminary theoretical analysis of this trade-off and its dependence on task structure.

Original languageEnglish
Title of host publicationInternational Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022
PublisherInternational Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Pages235-243
Number of pages9
ISBN (Electronic)978-171385433-3
Publication statusPublished - 2022
Event21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022 - Auckland, Virtual, New Zealand
Duration: 9 May 202213 May 2022

Publication series

NameProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Volume1
ISSN (Print)1548-8403
ISSN (Electronic)1558-2914

Conference

Conference21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022
Country/TerritoryNew Zealand
CityAuckland, Virtual
Period9/05/2213/05/22

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

  • Bayesian Reinforcement Learning
  • Computational Rationality
  • Hybrid Intelligence
  • Multiagent Learning

Fingerprint

Dive into the research topics of 'Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs'. Together they form a unique fingerprint.

Cite this