Alternating Maximization with Behavioral Cloning

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

2 Downloads (Pure)

Abstract

The key difficulty of cooperative, decentralized planning lies in making accurate predictions about the behavior of one’s teammates. In this paper we introduce a planning method of Alternating maximization with Behavioural Cloning (ABC) – a trainable online decentralized planning algorithm based on Monte Carlo Tree Search (MCTS), combined with models of teammates learned from previous episodic runs. Our algorithm relies on the idea of alternating maximization, where agents adapt their models one at a time in round-robin manner. Under the assumption of perfect policy cloning, and with a sufficient amount of Monte Carlo samples, successive iterations of our method are guaranteed to improve joint policies, and eventually converge.
Original languageEnglish
Title of host publicationBNAIC/BeneLearn 2020
EditorsLu Cao, Walter Kosters, Jefrey Lijffijt
PublisherRU Leiden
Pages370-371
Publication statusPublished - 2020
EventBNAIC/BENELEARN 2020 - Leiden, Netherlands
Duration: 19 Nov 202020 Nov 2020

Conference

ConferenceBNAIC/BENELEARN 2020
CountryNetherlands
CityLeiden
Period19/11/2020/11/20

Fingerprint Dive into the research topics of 'Alternating Maximization with Behavioral Cloning'. Together they form a unique fingerprint.

Cite this