Scalable Safe Policy Improvement for Factored Multi-Agent MDPs

Federico Bianchi*, Edoardo Zorzi, Alberto Castellini, Thiago D. Simão, Matthijs T.J. Spaan, Alessandro Farinelli

*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

4 Downloads (Pure)

Abstract

In this work, we focus on safe policy improvement in multi-agent domains where current state-of-the-art methods cannot be effectively applied because of large state and action spaces. We consider recent results using Monte Carlo Tree Search for Safe Policy Improvement with Baseline Bootstrapping and propose a novel algorithm that scales this approach to multi-agent domains, exploiting the factorization of the transition model and value function. Given a centralized behavior policy and a dataset of trajectories, our algorithm generates an improved policy by selecting joint actions using a novel extension of Max-Plus (or Variable Elimination) that constrains local actions to guarantee safety criteria. An empirical evaluation on multi-agent SysAdmin and multi-UAV Delivery shows that the approach scales to very large domains where state-of-the-art methods cannot work.

Original languageEnglish
Title of host publicationnternational Conference on Machine Learning
EditorsRuslan Salakhutdinov, Zico Kolter, Katherine Heller
Pages3952-3973
Number of pages22
Volume235
Publication statusPublished - 2024
Event41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria
Duration: 21 Jul 202427 Jul 2024

Publication series

NameProceedings of Machine Learning Research
ISSN (Print)2640-3498

Conference

Conference41st International Conference on Machine Learning, ICML 2024
Country/TerritoryAustria
CityVienna
Period21/07/2427/07/24

Fingerprint

Dive into the research topics of 'Scalable Safe Policy Improvement for Factored Multi-Agent MDPs'. Together they form a unique fingerprint.

Cite this