Solving Transition-Independent Multi-agent MDPs with Sparse Interactions

Joris Scharpff, Diederik M. Roijers, Frans A. Oliehoek, Matthijs T. J. Spaan, M.M. de Weerdt

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

16 Citations (Scopus)


In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate to find an optimal joint policy that maximises joint value. Typical algorithms exploit additive structure in the value function, but in the fully-observable multi-agent MDP (MMDP) setting such structure is not present. We propose a new optimal solver for transition-independent MMDPs, in which agents can only affect their own state but their reward depends on joint transitions. We represent these de- pendencies compactly in conditional return graphs (CRGs). Using CRGs the value of a joint policy and the bounds on partially specified joint policies can be efficiently computed. We propose CoRe, a novel branch-and-bound policy search algorithm building on CRGs. CoRe typically requires less runtime than the available alternatives and finds solutions to previously unsolvable problems.
Original languageEnglish
Title of host publicationProceedings of the Thirtieth AAAI Conference on Artificial Intelligence AAAI-16
PublisherAmerican Association for Artificial Intelligence (AAAI)
Number of pages7
Publication statusPublished - 2016
Event30th AAAI Conference on Artificial Intelligence, AAAI 2016) - Phoenix, United States
Duration: 12 Feb 201617 Feb 2016
Conference number: 30

Publication series

NameProceedings of the AAAI
PublisherAssociation for the Advancement of Artificial Intelligence.
ISSN (Print)2159-5399
ISSN (Electronic)2374-3468


Conference30th AAAI Conference on Artificial Intelligence, AAAI 2016)
Abbreviated titleAAAI'16
CountryUnited States


  • Markov Decision Process
  • Transition-independent Multi-agent MDPs
  • Reward interactions
  • Conditional Return Graphs

Fingerprint Dive into the research topics of 'Solving Transition-Independent Multi-agent MDPs with Sparse Interactions'. Together they form a unique fingerprint.

Cite this