Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs

Mustafa Mert Çelikok; Frans A. Oliehoek; Samuel Kaski

Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs

Mustafa Mert Çelikok, Frans A. Oliehoek, Samuel Kaski

Interactive Intelligence

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

2 Citations (Scopus)

8 Downloads (Pure)

Abstract

Centaurs are half-human, half-AI decision-makers where the AI's goal is to complement the human. To do so, the AI must be able to recognize the goals and constraints of the human and have the means to help them. We present a novel formulation of the interaction between the human and the AI as a sequential game where the agents are modelled using Bayesian best-response models. We show that in this case the AI's problem of helping bounded-rational humans make better decisions reduces to a Bayes-adaptive POMDP. In our simulated experiments, we consider an instantiation of our framework for humans who are subjectively optimistic about the AI's future behaviour. Our results show that when equipped with a model of the human, the AI can infer the human's bounds and nudge them towards better decisions. We discuss ways in which the machine can learn to improve upon its own limitations as well with the help of the human. We identify a novel trade-off for centaurs in partially observable tasks: for the AI's actions to be acceptable to the human, the machine must make sure their beliefs are sufficiently aligned, but aligning beliefs might be costly. We present a preliminary theoretical analysis of this trade-off and its dependence on task structure.

Original language	English
Title of host publication	International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022
Publisher	International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Pages	235-243
Number of pages	9
ISBN (Electronic)	978-171385433-3
Publication status	Published - 2022
Event	21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022 - Auckland, Virtual, New Zealand Duration: 9 May 2022 → 13 May 2022

Publication series

Name	Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Volume	1
ISSN (Print)	1548-8403
ISSN (Electronic)	1558-2914

Conference

Conference	21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022
Country/Territory	New Zealand
City	Auckland, Virtual
Period	9/05/22 → 13/05/22

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

Bayesian Reinforcement Learning
Computational Rationality
Hybrid Intelligence
Multiagent Learning

Access to Document

3535850.3535878Final published version, 1.25 MB

Cite this

Çelikok, M. M., Oliehoek, F. A., & Kaski, S. (2022). Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs. In International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022 (pp. 235-243). (Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS; Vol. 1). International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS).

Çelikok, Mustafa Mert ; Oliehoek, Frans A. ; Kaski, Samuel. / Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs. International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), 2022. pp. 235-243 (Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS).

@inproceedings{75bcb8d7ed7540a292bd2dc660f21b42,

title = "Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs",

abstract = "Centaurs are half-human, half-AI decision-makers where the AI's goal is to complement the human. To do so, the AI must be able to recognize the goals and constraints of the human and have the means to help them. We present a novel formulation of the interaction between the human and the AI as a sequential game where the agents are modelled using Bayesian best-response models. We show that in this case the AI's problem of helping bounded-rational humans make better decisions reduces to a Bayes-adaptive POMDP. In our simulated experiments, we consider an instantiation of our framework for humans who are subjectively optimistic about the AI's future behaviour. Our results show that when equipped with a model of the human, the AI can infer the human's bounds and nudge them towards better decisions. We discuss ways in which the machine can learn to improve upon its own limitations as well with the help of the human. We identify a novel trade-off for centaurs in partially observable tasks: for the AI's actions to be acceptable to the human, the machine must make sure their beliefs are sufficiently aligned, but aligning beliefs might be costly. We present a preliminary theoretical analysis of this trade-off and its dependence on task structure.",

keywords = "Bayesian Reinforcement Learning, Computational Rationality, Hybrid Intelligence, Multiagent Learning",

author = "{\c C}elikok, {Mustafa Mert} and Oliehoek, {Frans A.} and Samuel Kaski",

note = "Green Open Access added to TU Delft Institutional Repository {\textquoteleft}You share, we take care!{\textquoteright} – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. ; 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022 ; Conference date: 09-05-2022 Through 13-05-2022",

year = "2022",

language = "English",

series = "Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS",

publisher = "International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)",

pages = "235--243",

booktitle = "International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022",

}

Çelikok, MM , Oliehoek, FA & Kaski, S 2022, Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs. in International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022. Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, vol. 1, International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), pp. 235-243, 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022, Auckland, Virtual, New Zealand, 9/05/22.

Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs. / Çelikok, Mustafa Mert ; Oliehoek, Frans A.; Kaski, Samuel.
International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), 2022. p. 235-243 (Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS; Vol. 1).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs

AU - Çelikok, Mustafa Mert

AU - Oliehoek, Frans A.

AU - Kaski, Samuel

N1 - Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2022

Y1 - 2022

N2 - Centaurs are half-human, half-AI decision-makers where the AI's goal is to complement the human. To do so, the AI must be able to recognize the goals and constraints of the human and have the means to help them. We present a novel formulation of the interaction between the human and the AI as a sequential game where the agents are modelled using Bayesian best-response models. We show that in this case the AI's problem of helping bounded-rational humans make better decisions reduces to a Bayes-adaptive POMDP. In our simulated experiments, we consider an instantiation of our framework for humans who are subjectively optimistic about the AI's future behaviour. Our results show that when equipped with a model of the human, the AI can infer the human's bounds and nudge them towards better decisions. We discuss ways in which the machine can learn to improve upon its own limitations as well with the help of the human. We identify a novel trade-off for centaurs in partially observable tasks: for the AI's actions to be acceptable to the human, the machine must make sure their beliefs are sufficiently aligned, but aligning beliefs might be costly. We present a preliminary theoretical analysis of this trade-off and its dependence on task structure.

AB - Centaurs are half-human, half-AI decision-makers where the AI's goal is to complement the human. To do so, the AI must be able to recognize the goals and constraints of the human and have the means to help them. We present a novel formulation of the interaction between the human and the AI as a sequential game where the agents are modelled using Bayesian best-response models. We show that in this case the AI's problem of helping bounded-rational humans make better decisions reduces to a Bayes-adaptive POMDP. In our simulated experiments, we consider an instantiation of our framework for humans who are subjectively optimistic about the AI's future behaviour. Our results show that when equipped with a model of the human, the AI can infer the human's bounds and nudge them towards better decisions. We discuss ways in which the machine can learn to improve upon its own limitations as well with the help of the human. We identify a novel trade-off for centaurs in partially observable tasks: for the AI's actions to be acceptable to the human, the machine must make sure their beliefs are sufficiently aligned, but aligning beliefs might be costly. We present a preliminary theoretical analysis of this trade-off and its dependence on task structure.

KW - Bayesian Reinforcement Learning

KW - Computational Rationality

KW - Hybrid Intelligence

KW - Multiagent Learning

UR - http://www.scopus.com/inward/record.url?scp=85134303565&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85134303565

T3 - Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS

SP - 235

EP - 243

BT - International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022

PB - International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)

T2 - 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022

Y2 - 9 May 2022 through 13 May 2022

ER -

Çelikok MM , Oliehoek FA, Kaski S. Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs. In International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS). 2022. p. 235-243. (Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS).

Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs

Abstract

Publication series

Conference

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this