TY - GEN
T1 - Scalable Safe Policy Improvement for Factored Multi-Agent MDPs
AU - Bianchi, Federico
AU - Zorzi, Edoardo
AU - Castellini, Alberto
AU - Simão, Thiago D.
AU - Spaan, Matthijs T.J.
AU - Farinelli, Alessandro
PY - 2024
Y1 - 2024
N2 - In this work, we focus on safe policy improvement in multi-agent domains where current state-of-the-art methods cannot be effectively applied because of large state and action spaces. We consider recent results using Monte Carlo Tree Search for Safe Policy Improvement with Baseline Bootstrapping and propose a novel algorithm that scales this approach to multi-agent domains, exploiting the factorization of the transition model and value function. Given a centralized behavior policy and a dataset of trajectories, our algorithm generates an improved policy by selecting joint actions using a novel extension of Max-Plus (or Variable Elimination) that constrains local actions to guarantee safety criteria. An empirical evaluation on multi-agent SysAdmin and multi-UAV Delivery shows that the approach scales to very large domains where state-of-the-art methods cannot work.
AB - In this work, we focus on safe policy improvement in multi-agent domains where current state-of-the-art methods cannot be effectively applied because of large state and action spaces. We consider recent results using Monte Carlo Tree Search for Safe Policy Improvement with Baseline Bootstrapping and propose a novel algorithm that scales this approach to multi-agent domains, exploiting the factorization of the transition model and value function. Given a centralized behavior policy and a dataset of trajectories, our algorithm generates an improved policy by selecting joint actions using a novel extension of Max-Plus (or Variable Elimination) that constrains local actions to guarantee safety criteria. An empirical evaluation on multi-agent SysAdmin and multi-UAV Delivery shows that the approach scales to very large domains where state-of-the-art methods cannot work.
UR - http://www.scopus.com/inward/record.url?scp=85203829516&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85203829516
VL - 235
T3 - Proceedings of Machine Learning Research
SP - 3952
EP - 3973
BT - nternational Conference on Machine Learning
A2 - Salakhutdinov, Ruslan
A2 - Kolter, Zico
A2 - Heller, Katherine
T2 - 41st International Conference on Machine Learning, ICML 2024
Y2 - 21 July 2024 through 27 July 2024
ER -