Refined Risk Management in Safe Reinforcement Learning with a Distributional Safety Critic

Q. Yang; T. D. Simão; Simon H. Tindemans; M.T.J. Spaan

Refined Risk Management in Safe Reinforcement Learning with a Distributional Safety Critic

Q. Yang, T. D. Simão, Simon H. Tindemans, M.T.J. Spaan

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

78 Downloads (Pure)

Abstract

Safety is critical to broadening the real-world use of reinforcement learning (RL). Modeling the safety aspects using a safety-cost signal separate from the reward is becoming standard practice, since it avoids the problem of finding a good balance between safety and performance. However, the total safety-cost distribution of different trajectories is still largely unexplored. In this paper, we propose an actor critic method for safe RL that uses an implicit quantile network to approximate the distribution of accumulated safety-costs. Using an accurate estimate of the distribution of accumulated safetycosts, in particular of the upper tail of the distribution, greatly improves the performance of riskaverse RL agents. The empirical analysis shows that our method achieves good risk control in complex safety-constrained environments.

Original language	English
Title of host publication	Safe RL Workshop at IJCAI 2022
Editors	David Bossens, Stephen Giguere, Roderick Bloem, Bettina Koenighofer
Number of pages	4
Publication status	Published - 2022
Event	International Workshop on Safe Reinforcement Learning - Vienna, Austria Duration: 23 Jul 2022 → 23 Jul 2022 Conference number: 1

Workshop

Workshop	International Workshop on Safe Reinforcement Learning
Abbreviated title	Safe RL workshop
Country/Territory	Austria
City	Vienna
Period	23/07/22 → 23/07/22

Access to Document

SafeRL2022_RefinedRiskManagementFinal published version, 1.73 MB

Cite this

@inproceedings{7cfa2a336ad7458a827d06e9d57a969c,

title = "Refined Risk Management in Safe Reinforcement Learning with a Distributional Safety Critic",

abstract = "Safety is critical to broadening the real-world use of reinforcement learning (RL). Modeling the safety aspects using a safety-cost signal separate from the reward is becoming standard practice, since it avoids the problem of finding a good balance between safety and performance. However, the total safety-cost distribution of different trajectories is still largely unexplored. In this paper, we propose an actor critic method for safe RL that uses an implicit quantile network to approximate the distribution of accumulated safety-costs. Using an accurate estimate of the distribution of accumulated safetycosts, in particular of the upper tail of the distribution, greatly improves the performance of riskaverse RL agents. The empirical analysis shows that our method achieves good risk control in complex safety-constrained environments. ",

author = "Q. Yang and Sim{\~a}o, {T. D.} and Tindemans, {Simon H.} and M.T.J. Spaan",

year = "2022",

language = "English",

editor = "Bossens, {David } and Giguere, {Stephen } and Bloem, {Roderick } and Koenighofer, {Bettina }",

booktitle = "Safe RL Workshop at IJCAI 2022",

note = "International Workshop on Safe Reinforcement Learning, Safe RL workshop ; Conference date: 23-07-2022 Through 23-07-2022",

}

Refined Risk Management in Safe Reinforcement Learning with a Distributional Safety Critic. / Yang, Q.; Simão, T. D.; Tindemans, Simon H. et al.
Safe RL Workshop at IJCAI 2022. ed. / David Bossens; Stephen Giguere; Roderick Bloem; Bettina Koenighofer. 2022.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Refined Risk Management in Safe Reinforcement Learning with a Distributional Safety Critic

AU - Yang, Q.

AU - Simão, T. D.

AU - Tindemans, Simon H.

AU - Spaan, M.T.J.

N1 - Conference code: 1

PY - 2022

Y1 - 2022

N2 - Safety is critical to broadening the real-world use of reinforcement learning (RL). Modeling the safety aspects using a safety-cost signal separate from the reward is becoming standard practice, since it avoids the problem of finding a good balance between safety and performance. However, the total safety-cost distribution of different trajectories is still largely unexplored. In this paper, we propose an actor critic method for safe RL that uses an implicit quantile network to approximate the distribution of accumulated safety-costs. Using an accurate estimate of the distribution of accumulated safetycosts, in particular of the upper tail of the distribution, greatly improves the performance of riskaverse RL agents. The empirical analysis shows that our method achieves good risk control in complex safety-constrained environments.

AB - Safety is critical to broadening the real-world use of reinforcement learning (RL). Modeling the safety aspects using a safety-cost signal separate from the reward is becoming standard practice, since it avoids the problem of finding a good balance between safety and performance. However, the total safety-cost distribution of different trajectories is still largely unexplored. In this paper, we propose an actor critic method for safe RL that uses an implicit quantile network to approximate the distribution of accumulated safety-costs. Using an accurate estimate of the distribution of accumulated safetycosts, in particular of the upper tail of the distribution, greatly improves the performance of riskaverse RL agents. The empirical analysis shows that our method achieves good risk control in complex safety-constrained environments.

UR - https://sites.google.com/view/safe-rl-2022/papers

M3 - Conference contribution

BT - Safe RL Workshop at IJCAI 2022

A2 - Bossens, David

A2 - Giguere, Stephen

A2 - Bloem, Roderick

A2 - Koenighofer, Bettina

T2 - International Workshop on Safe Reinforcement Learning

Y2 - 23 July 2022 through 23 July 2022

ER -

Refined Risk Management in Safe Reinforcement Learning with a Distributional Safety Critic

Abstract

Workshop

Access to Document

Other files and links

Fingerprint

Cite this