AlwaysSafe: Reinforcement Learning without Safety Constraint Violations during Training

T. D. Simão; Nils Jansen; M.T.J. Spaan

AlwaysSafe: Reinforcement Learning without Safety Constraint Violations during Training

Algorithmics

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

353 Downloads (Pure)

Abstract

Deploying reinforcement learning (RL) involves major concerns around safety. Engineering a reward signal that allows the agent to maximize its performance while remaining safe is not trivial. Safe RL studies how to mitigate such problems. For instance, we can decouple safety from reward using constrained Markov decision processes (CMDPs), where an independent signal models the safety aspects. In this setting, an RL agent can autonomously find tradeoffs between performance and safety. Unfortunately, most RL agents designed for CMDPs only guarantee safety after the learning phase, which might prevent their direct deployment. In this work, we investigate settings where a concise abstract model of the safety aspects is given, a reasonable assumption since a thorough understanding of safety-related matters is a prerequisite for deploying RL in typical applications. Factored CMDPs provide such compact models when a small subset of features describe the dynamics relevant for the safety constraints. We propose an RL algorithm that uses this abstract model to learn policies for CMDPs safely, that is without violating the constraints. During the training process, this algorithm can seamlessly switch from a conservative policy to a greedy policy without violating the safety constraints. We prove that this algorithm is safe under the given assumptions. Empirically, we show that even if safety and reward signals are contradictory, this algorithm always operates safely and, when they are aligned, this approach also improves the agent's performance.

Original language	English
Title of host publication	Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems
Place of Publication	Richland, SC
Publisher	International Foundation for Autonomous Agents and Multiagent Systems
Pages	1226-1235
Number of pages	10
ISBN (Electronic)	9781450383073
Publication status	Published - 2021
Event	20th International Conference on Autonomous Agentsand Multiagent Systems - Virtual/online event due to COVID-19 Duration: 3 May 2021 → 7 May 2021 Conference number: 20

Publication series

Name	AAMAS '21
Publisher	International Foundation for Autonomous Agents and Multiagent Systems
ISSN (Electronic)	2523-5699

Conference

Conference	20th International Conference on Autonomous Agentsand Multiagent Systems
Abbreviated title	AAMAS 2021
Period	3/05/21 → 7/05/21

Access to Document

Supplementary MaterialFinal published version, 882 KB
p1226Final published version, 2.53 MB

http://www.ifaamas.org/Proceedings/aamas2021/pdfs/p1226.pdf

1 Dissertation (TU Delft)

Safe Online and Offline Reinforcement Learning
Simão, T. D., 2023, 128 p.
Research output: Thesis › Dissertation (TU Delft)

Open Access
File
196 Downloads (Pure)

Cite this

Simão, T. D., Jansen, N., & Spaan, M. T. J. (2021). AlwaysSafe: Reinforcement Learning without Safety Constraint Violations during Training. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems (pp. 1226-1235). (AAMAS '21). International Foundation for Autonomous Agents and Multiagent Systems. http://www.ifaamas.org/Proceedings/aamas2021/pdfs/p1226.pdf

@inproceedings{6e33d0fdf80f46a5b9b1af09b15ca5f5,

title = "AlwaysSafe: Reinforcement Learning without Safety Constraint Violations during Training",

abstract = "Deploying reinforcement learning (RL) involves major concerns around safety. Engineering a reward signal that allows the agent to maximize its performance while remaining safe is not trivial. Safe RL studies how to mitigate such problems. For instance, we can decouple safety from reward using constrained Markov decision processes (CMDPs), where an independent signal models the safety aspects. In this setting, an RL agent can autonomously find tradeoffs between performance and safety. Unfortunately, most RL agents designed for CMDPs only guarantee safety after the learning phase, which might prevent their direct deployment. In this work, we investigate settings where a concise abstract model of the safety aspects is given, a reasonable assumption since a thorough understanding of safety-related matters is a prerequisite for deploying RL in typical applications. Factored CMDPs provide such compact models when a small subset of features describe the dynamics relevant for the safety constraints. We propose an RL algorithm that uses this abstract model to learn policies for CMDPs safely, that is without violating the constraints. During the training process, this algorithm can seamlessly switch from a conservative policy to a greedy policy without violating the safety constraints. We prove that this algorithm is safe under the given assumptions. Empirically, we show that even if safety and reward signals are contradictory, this algorithm always operates safely and, when they are aligned, this approach also improves the agent's performance.",

author = "Sim{\~a}o, {T. D.} and Nils Jansen and M.T.J. Spaan",

year = "2021",

language = "English",

series = "AAMAS '21",

publisher = "International Foundation for Autonomous Agents and Multiagent Systems",

pages = "1226--1235",

booktitle = "Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems",

note = "20th International Conference on Autonomous Agentsand Multiagent Systems, AAMAS 2021 ; Conference date: 03-05-2021 Through 07-05-2021",

}

Simão, TD, Jansen, N & Spaan, MTJ 2021, AlwaysSafe: Reinforcement Learning without Safety Constraint Violations during Training. in Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. AAMAS '21, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, pp. 1226-1235, 20th International Conference on Autonomous Agentsand Multiagent Systems, 3/05/21. <http://www.ifaamas.org/Proceedings/aamas2021/pdfs/p1226.pdf>

AlwaysSafe: Reinforcement Learning without Safety Constraint Violations during Training. / Simão, T. D.; Jansen, Nils; Spaan, M.T.J.
Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems, 2021. p. 1226-1235 (AAMAS '21).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - AlwaysSafe: Reinforcement Learning without Safety Constraint Violations during Training

AU - Simão, T. D.

AU - Jansen, Nils

AU - Spaan, M.T.J.

N1 - Conference code: 20

PY - 2021

Y1 - 2021

N2 - Deploying reinforcement learning (RL) involves major concerns around safety. Engineering a reward signal that allows the agent to maximize its performance while remaining safe is not trivial. Safe RL studies how to mitigate such problems. For instance, we can decouple safety from reward using constrained Markov decision processes (CMDPs), where an independent signal models the safety aspects. In this setting, an RL agent can autonomously find tradeoffs between performance and safety. Unfortunately, most RL agents designed for CMDPs only guarantee safety after the learning phase, which might prevent their direct deployment. In this work, we investigate settings where a concise abstract model of the safety aspects is given, a reasonable assumption since a thorough understanding of safety-related matters is a prerequisite for deploying RL in typical applications. Factored CMDPs provide such compact models when a small subset of features describe the dynamics relevant for the safety constraints. We propose an RL algorithm that uses this abstract model to learn policies for CMDPs safely, that is without violating the constraints. During the training process, this algorithm can seamlessly switch from a conservative policy to a greedy policy without violating the safety constraints. We prove that this algorithm is safe under the given assumptions. Empirically, we show that even if safety and reward signals are contradictory, this algorithm always operates safely and, when they are aligned, this approach also improves the agent's performance.

AB - Deploying reinforcement learning (RL) involves major concerns around safety. Engineering a reward signal that allows the agent to maximize its performance while remaining safe is not trivial. Safe RL studies how to mitigate such problems. For instance, we can decouple safety from reward using constrained Markov decision processes (CMDPs), where an independent signal models the safety aspects. In this setting, an RL agent can autonomously find tradeoffs between performance and safety. Unfortunately, most RL agents designed for CMDPs only guarantee safety after the learning phase, which might prevent their direct deployment. In this work, we investigate settings where a concise abstract model of the safety aspects is given, a reasonable assumption since a thorough understanding of safety-related matters is a prerequisite for deploying RL in typical applications. Factored CMDPs provide such compact models when a small subset of features describe the dynamics relevant for the safety constraints. We propose an RL algorithm that uses this abstract model to learn policies for CMDPs safely, that is without violating the constraints. During the training process, this algorithm can seamlessly switch from a conservative policy to a greedy policy without violating the safety constraints. We prove that this algorithm is safe under the given assumptions. Empirically, we show that even if safety and reward signals are contradictory, this algorithm always operates safely and, when they are aligned, this approach also improves the agent's performance.

M3 - Conference contribution

T3 - AAMAS '21

SP - 1226

EP - 1235

BT - Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems

PB - International Foundation for Autonomous Agents and Multiagent Systems

CY - Richland, SC

T2 - 20th International Conference on Autonomous Agentsand Multiagent Systems

Y2 - 3 May 2021 through 7 May 2021

ER -

AlwaysSafe: Reinforcement Learning without Safety Constraint Violations during Training

Abstract

Publication series

Conference

Access to Document

Fingerprint

Research output

Safe Online and Offline Reinforcement Learning

Cite this