Safe Policies for Factored Partially Observable Stochastic Games

Steven Carr; Nils Jansen; Suda Bharadwaj; M.T.J. Spaan; Ufuk Topcu

doi:10.15607/RSS.2021.XVII.079

Safe Policies for Factored Partially Observable Stochastic Games

Steven Carr, Nils Jansen, Suda Bharadwaj, M.T.J. Spaan, Ufuk Topcu

Algorithmics

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

198 Downloads (Pure)

Abstract

We study planning problems where a controllable agent operates under partial observability and interacts with an uncontrollable opponent, also referred to as the adversary. The agent has two distinct objectives: To maximize an expected
value and to adhere to a safety specification. Multi-objective partially observable stochastic games (POSGs) formally model such problems. Yet, even for a single objective, the task of computing suitable policies for POSGs is theoretically hard and computationally intractable in practice. Using a factored state-space representation, we define a decoupling scheme for the POSG state space that—under certain assumptions on the observability and the reward structure—separates the state components relevant for the reward from those relevant for safety. This decoupling affects the possibility to compute provably safe and reward-optimal policies in a tractable two-stage approach. In particular, on the fully observable components related to safety, we exactly compute the set of policies that captures all possible safe choices against the opponent. We restrict the agent’s behavior to these safe policies and project the POSG to a partially observable Markov decision process (POMDP). Any
reward-maximal policy for the POMDP is then guaranteed to be safe and reward-maximal for the POSG. We showcase our approach’s feasibility using high-fidelity simulations of two case studies that concern UAV path planning and autonomous driving. Moreover, to demonstrate the practical applicability, we design a physical experiment involving a robot decision making problem
under energy constraints that is motivated by a paired helicopter with NASA’s Perseverance Mars rover.

Original language	English
Title of host publication	Robotics: Science and System XVII
Editors	Dylan A. Shell, Marc Toussaint, M. Ani Hsieh
Number of pages	11
ISBN (Electronic)	978-0-9923747-7-8
DOIs	https://doi.org/10.15607/RSS.2021.XVII.079
Publication status	Published - 2021
Event	Robotics: Science and Systems XVII, 2021 - Duration: 12 Jul 2021 → 16 Jul 2021

Conference

Conference	Robotics: Science and Systems XVII, 2021
Period	12/07/21 → 16/07/21

Access to Document

10.15607/RSS.2021.XVII.079

p079(1)Final published version, 17.4 MB

Cite this

@inproceedings{d5fb62e29d014dc6806749d35228ab3a,

title = "Safe Policies for Factored Partially Observable Stochastic Games",

abstract = "We study planning problems where a controllable agent operates under partial observability and interacts with an uncontrollable opponent, also referred to as the adversary. The agent has two distinct objectives: To maximize an expectedvalue and to adhere to a safety specification. Multi-objective partially observable stochastic games (POSGs) formally model such problems. Yet, even for a single objective, the task of computing suitable policies for POSGs is theoretically hard and computationally intractable in practice. Using a factored state-space representation, we define a decoupling scheme for the POSG state space that—under certain assumptions on the observability and the reward structure—separates the state components relevant for the reward from those relevant for safety. This decoupling affects the possibility to compute provably safe and reward-optimal policies in a tractable two-stage approach. In particular, on the fully observable components related to safety, we exactly compute the set of policies that captures all possible safe choices against the opponent. We restrict the agent{\textquoteright}s behavior to these safe policies and project the POSG to a partially observable Markov decision process (POMDP). Anyreward-maximal policy for the POMDP is then guaranteed to be safe and reward-maximal for the POSG. We showcase our approach{\textquoteright}s feasibility using high-fidelity simulations of two case studies that concern UAV path planning and autonomous driving. Moreover, to demonstrate the practical applicability, we design a physical experiment involving a robot decision making problemunder energy constraints that is motivated by a paired helicopter with NASA{\textquoteright}s Perseverance Mars rover.",

author = "Steven Carr and Nils Jansen and Suda Bharadwaj and M.T.J. Spaan and Ufuk Topcu",

year = "2021",

doi = "10.15607/RSS.2021.XVII.079",

language = "English",

editor = "Shell, {Dylan A. } and Toussaint, {Marc } and Hsieh, {M. Ani}",

booktitle = "Robotics: Science and System XVII",

note = "Robotics: Science and Systems XVII, 2021 ; Conference date: 12-07-2021 Through 16-07-2021",

}

TY - GEN

T1 - Safe Policies for Factored Partially Observable Stochastic Games

AU - Carr, Steven

AU - Jansen, Nils

AU - Bharadwaj, Suda

AU - Spaan, M.T.J.

AU - Topcu, Ufuk

PY - 2021

Y1 - 2021

N2 - We study planning problems where a controllable agent operates under partial observability and interacts with an uncontrollable opponent, also referred to as the adversary. The agent has two distinct objectives: To maximize an expectedvalue and to adhere to a safety specification. Multi-objective partially observable stochastic games (POSGs) formally model such problems. Yet, even for a single objective, the task of computing suitable policies for POSGs is theoretically hard and computationally intractable in practice. Using a factored state-space representation, we define a decoupling scheme for the POSG state space that—under certain assumptions on the observability and the reward structure—separates the state components relevant for the reward from those relevant for safety. This decoupling affects the possibility to compute provably safe and reward-optimal policies in a tractable two-stage approach. In particular, on the fully observable components related to safety, we exactly compute the set of policies that captures all possible safe choices against the opponent. We restrict the agent’s behavior to these safe policies and project the POSG to a partially observable Markov decision process (POMDP). Anyreward-maximal policy for the POMDP is then guaranteed to be safe and reward-maximal for the POSG. We showcase our approach’s feasibility using high-fidelity simulations of two case studies that concern UAV path planning and autonomous driving. Moreover, to demonstrate the practical applicability, we design a physical experiment involving a robot decision making problemunder energy constraints that is motivated by a paired helicopter with NASA’s Perseverance Mars rover.

AB - We study planning problems where a controllable agent operates under partial observability and interacts with an uncontrollable opponent, also referred to as the adversary. The agent has two distinct objectives: To maximize an expectedvalue and to adhere to a safety specification. Multi-objective partially observable stochastic games (POSGs) formally model such problems. Yet, even for a single objective, the task of computing suitable policies for POSGs is theoretically hard and computationally intractable in practice. Using a factored state-space representation, we define a decoupling scheme for the POSG state space that—under certain assumptions on the observability and the reward structure—separates the state components relevant for the reward from those relevant for safety. This decoupling affects the possibility to compute provably safe and reward-optimal policies in a tractable two-stage approach. In particular, on the fully observable components related to safety, we exactly compute the set of policies that captures all possible safe choices against the opponent. We restrict the agent’s behavior to these safe policies and project the POSG to a partially observable Markov decision process (POMDP). Anyreward-maximal policy for the POMDP is then guaranteed to be safe and reward-maximal for the POSG. We showcase our approach’s feasibility using high-fidelity simulations of two case studies that concern UAV path planning and autonomous driving. Moreover, to demonstrate the practical applicability, we design a physical experiment involving a robot decision making problemunder energy constraints that is motivated by a paired helicopter with NASA’s Perseverance Mars rover.

U2 - 10.15607/RSS.2021.XVII.079

DO - 10.15607/RSS.2021.XVII.079

M3 - Conference contribution

BT - Robotics: Science and System XVII

A2 - Shell, Dylan A.

A2 - Toussaint, Marc

A2 - Hsieh, M. Ani

T2 - Robotics: Science and Systems XVII, 2021

Y2 - 12 July 2021 through 16 July 2021

ER -

Safe Policies for Factored Partially Observable Stochastic Games

Abstract

Conference

Access to Document

Fingerprint

Cite this