General-Sum Multi-Agent Continuous Inverse Optimal Control

Christian Neumeyer; Frans A. Oliehoek; Dariu M. Gavrila

doi:10.1109/LRA.2021.3060411

General-Sum Multi-Agent Continuous Inverse Optimal Control

Christian Neumeyer, Frans A. Oliehoek, Dariu M. Gavrila

Research output: Contribution to journal › Article › Scientific › peer-review

5 Citations (Scopus)

50 Downloads (Pure)

Abstract

Modeling possible future outcomes of robot-human interactions is of importance in the intelligent vehicle and mobile robotics domains. Knowing the reward function that explains the observed behavior of a human agent is advantageous for modeling the behavior with Markov Decision Processes (MDPs). However, learning the rewards that determine the observed actions from data is complicated by interactions. We present a novel inverse reinforcement learning (IRL) algorithm that can infer the reward function in multi-Agent interactive scenarios. In particular, the agents may act boundedly rational (i.e., sub-optimal), a characteristic that is typical for human decision making. Additionally, every agent optimizes its own reward function which makes it possible to address non-cooperative setups. In contrast to other methods, the algorithm does not rely on reinforcement learning during inference of the parameters of the reward function. We demonstrate that our proposed method accurately infers the ground truth reward function in two-Agent interactive experiments.1

Original language	English
Pages (from-to)	3429-3436
Journal	IEEE Robotics and Automation Letters
Volume	6
Issue number	2
DOIs	https://doi.org/10.1109/LRA.2021.3060411
Publication status	Published - 2021

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

Inverse Reinforcement Learning
Learning from Demonstration
Reinforcement Learning

Access to Document

10.1109/LRA.2021.3060411

09357891Final published version, 558 KB

Cite this

@article{a5be5a49470f4a64ad44e4586beedd25,

title = "General-Sum Multi-Agent Continuous Inverse Optimal Control",

abstract = "Modeling possible future outcomes of robot-human interactions is of importance in the intelligent vehicle and mobile robotics domains. Knowing the reward function that explains the observed behavior of a human agent is advantageous for modeling the behavior with Markov Decision Processes (MDPs). However, learning the rewards that determine the observed actions from data is complicated by interactions. We present a novel inverse reinforcement learning (IRL) algorithm that can infer the reward function in multi-Agent interactive scenarios. In particular, the agents may act boundedly rational (i.e., sub-optimal), a characteristic that is typical for human decision making. Additionally, every agent optimizes its own reward function which makes it possible to address non-cooperative setups. In contrast to other methods, the algorithm does not rely on reinforcement learning during inference of the parameters of the reward function. We demonstrate that our proposed method accurately infers the ground truth reward function in two-Agent interactive experiments.1",

keywords = "Inverse Reinforcement Learning, Learning from Demonstration, Reinforcement Learning",

author = "Christian Neumeyer and Oliehoek, {Frans A.} and Gavrila, {Dariu M.}",

note = "Green Open Access added to TU Delft Institutional Repository {\textquoteleft}You share, we take care!{\textquoteright} – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. ",

year = "2021",

doi = "10.1109/LRA.2021.3060411",

language = "English",

volume = "6",

pages = "3429--3436",

journal = "IEEE Robotics and Automation Letters",

issn = "2377-3766",

publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

number = "2",

}

TY - JOUR

T1 - General-Sum Multi-Agent Continuous Inverse Optimal Control

AU - Neumeyer, Christian

AU - Oliehoek, Frans A.

AU - Gavrila, Dariu M.

N1 - Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2021

Y1 - 2021

N2 - Modeling possible future outcomes of robot-human interactions is of importance in the intelligent vehicle and mobile robotics domains. Knowing the reward function that explains the observed behavior of a human agent is advantageous for modeling the behavior with Markov Decision Processes (MDPs). However, learning the rewards that determine the observed actions from data is complicated by interactions. We present a novel inverse reinforcement learning (IRL) algorithm that can infer the reward function in multi-Agent interactive scenarios. In particular, the agents may act boundedly rational (i.e., sub-optimal), a characteristic that is typical for human decision making. Additionally, every agent optimizes its own reward function which makes it possible to address non-cooperative setups. In contrast to other methods, the algorithm does not rely on reinforcement learning during inference of the parameters of the reward function. We demonstrate that our proposed method accurately infers the ground truth reward function in two-Agent interactive experiments.1

AB - Modeling possible future outcomes of robot-human interactions is of importance in the intelligent vehicle and mobile robotics domains. Knowing the reward function that explains the observed behavior of a human agent is advantageous for modeling the behavior with Markov Decision Processes (MDPs). However, learning the rewards that determine the observed actions from data is complicated by interactions. We present a novel inverse reinforcement learning (IRL) algorithm that can infer the reward function in multi-Agent interactive scenarios. In particular, the agents may act boundedly rational (i.e., sub-optimal), a characteristic that is typical for human decision making. Additionally, every agent optimizes its own reward function which makes it possible to address non-cooperative setups. In contrast to other methods, the algorithm does not rely on reinforcement learning during inference of the parameters of the reward function. We demonstrate that our proposed method accurately infers the ground truth reward function in two-Agent interactive experiments.1

KW - Inverse Reinforcement Learning

KW - Learning from Demonstration

KW - Reinforcement Learning

UR - http://www.scopus.com/inward/record.url?scp=85101754599&partnerID=8YFLogxK

U2 - 10.1109/LRA.2021.3060411

DO - 10.1109/LRA.2021.3060411

M3 - Article

AN - SCOPUS:85101754599

SN - 2377-3766

VL - 6

SP - 3429

EP - 3436

JO - IEEE Robotics and Automation Letters

JF - IEEE Robotics and Automation Letters

IS - 2

ER -

General-Sum Multi-Agent Continuous Inverse Optimal Control

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this