## Abstract

We study planning problems where a controllable agent operates under partial observability and interacts with an uncontrollable opponent, also referred to as the adversary. The agent has two distinct objectives: To maximize an expected

value and to adhere to a safety specification. Multi-objective partially observable stochastic games (POSGs) formally model such problems. Yet, even for a single objective, the task of computing suitable policies for POSGs is theoretically hard and computationally intractable in practice. Using a factored state-space representation, we define a decoupling scheme for the POSG state space that—under certain assumptions on the observability and the reward structure—separates the state components relevant for the reward from those relevant for safety. This decoupling affects the possibility to compute provably safe and reward-optimal policies in a tractable two-stage approach. In particular, on the fully observable components related to safety, we exactly compute the set of policies that captures all possible safe choices against the opponent. We restrict the agent’s behavior to these safe policies and project the POSG to a partially observable Markov decision process (POMDP). Any

reward-maximal policy for the POMDP is then guaranteed to be safe and reward-maximal for the POSG. We showcase our approach’s feasibility using high-fidelity simulations of two case studies that concern UAV path planning and autonomous driving. Moreover, to demonstrate the practical applicability, we design a physical experiment involving a robot decision making problem

under energy constraints that is motivated by a paired helicopter with NASA’s Perseverance Mars rover.

value and to adhere to a safety specification. Multi-objective partially observable stochastic games (POSGs) formally model such problems. Yet, even for a single objective, the task of computing suitable policies for POSGs is theoretically hard and computationally intractable in practice. Using a factored state-space representation, we define a decoupling scheme for the POSG state space that—under certain assumptions on the observability and the reward structure—separates the state components relevant for the reward from those relevant for safety. This decoupling affects the possibility to compute provably safe and reward-optimal policies in a tractable two-stage approach. In particular, on the fully observable components related to safety, we exactly compute the set of policies that captures all possible safe choices against the opponent. We restrict the agent’s behavior to these safe policies and project the POSG to a partially observable Markov decision process (POMDP). Any

reward-maximal policy for the POMDP is then guaranteed to be safe and reward-maximal for the POSG. We showcase our approach’s feasibility using high-fidelity simulations of two case studies that concern UAV path planning and autonomous driving. Moreover, to demonstrate the practical applicability, we design a physical experiment involving a robot decision making problem

under energy constraints that is motivated by a paired helicopter with NASA’s Perseverance Mars rover.

Original language | English |
---|---|

Title of host publication | Robotics: Science and System XVII |

Editors | Dylan A. Shell, Marc Toussaint, M. Ani Hsieh |

Number of pages | 11 |

ISBN (Electronic) | 978-0-9923747-7-8 |

DOIs | |

Publication status | Published - 2021 |

Event | Robotics: Science and Systems XVII, 2021 - Duration: 12 Jul 2021 → 16 Jul 2021 |

### Conference

Conference | Robotics: Science and Systems XVII, 2021 |
---|---|

Period | 12/07/21 → 16/07/21 |