General-Sum Multi-Agent Continuous Inverse Optimal Control

Research output: Contribution to journalArticleScientificpeer-review

1 Citation (Scopus)
6 Downloads (Pure)


Modeling possible future outcomes of robot-human interactions is of importance in the intelligent vehicle and mobile robotics domains. Knowing the reward function that explains the observed behavior of a human agent is advantageous for modeling the behavior with Markov Decision Processes (MDPs). However, learning the rewards that determine the observed actions from data is complicated by interactions. We present a novel inverse reinforcement learning (IRL) algorithm that can infer the reward function in multi-Agent interactive scenarios. In particular, the agents may act boundedly rational (i.e., sub-optimal), a characteristic that is typical for human decision making. Additionally, every agent optimizes its own reward function which makes it possible to address non-cooperative setups. In contrast to other methods, the algorithm does not rely on reinforcement learning during inference of the parameters of the reward function. We demonstrate that our proposed method accurately infers the ground truth reward function in two-Agent interactive experiments.1

Original languageEnglish
Pages (from-to)3429-3436
JournalIEEE Robotics and Automation Letters
Issue number2
Publication statusPublished - 2021

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.


  • Inverse Reinforcement Learning
  • Learning from Demonstration
  • Reinforcement Learning


Dive into the research topics of 'General-Sum Multi-Agent Continuous Inverse Optimal Control'. Together they form a unique fingerprint.

Cite this