Abstract
The development of multi-agent reinforcement learning has been largely driven by the question of how to design learning algorithms to reach some particular notion of optimality of strategies, e.g. Nash equilibria. The set of optimal strategies is not known before the execution of the learning algorithm,
however we can often immediately identify a set of clearly undesirable outcomes. Therefore, we propose to consider a dual problem: given a collection of agent algorithms and a collection of unwanted strategy profiles, can one identify a set
of starting strategies that invariably lead there? This leads us to study the algorithmic problem of backpropagation of con-straints defining the forbidden region by learning dynamics,
through the lens of set-valued maps and interval arithmetics.
however we can often immediately identify a set of clearly undesirable outcomes. Therefore, we propose to consider a dual problem: given a collection of agent algorithms and a collection of unwanted strategy profiles, can one identify a set
of starting strategies that invariably lead there? This leads us to study the algorithmic problem of backpropagation of con-straints defining the forbidden region by learning dynamics,
through the lens of set-valued maps and interval arithmetics.
Original language | English |
---|---|
Number of pages | 4 |
Publication status | Published - 2021 |
Event | COMARL AAAI 2021: Spring Symposium Series - Stanford University, Palo Alto, United States Duration: 22 Mar 2021 → 23 Mar 2021 |
Conference
Conference | COMARL AAAI 2021 |
---|---|
Country/Territory | United States |
City | Palo Alto |
Period | 22/03/21 → 23/03/21 |