Abstract
Safety is critical to broadening the real-world use of reinforcement learning (RL). Modeling the safety aspects using a safety-cost signal separate from the reward is becoming standard practice, since it avoids the problem of finding a good balance between safety and performance. However, the total safety-cost distribution of different trajectories is still largely unexplored. In this paper, we propose an actor critic method for safe RL that uses an implicit quantile network to approximate the distribution of accumulated safety-costs. Using an accurate estimate of the distribution of accumulated safetycosts, in particular of the upper tail of the distribution, greatly improves the performance of riskaverse RL agents. The empirical analysis shows that our method achieves good risk control in complex safety-constrained environments.
Original language | English |
---|---|
Title of host publication | Safe RL Workshop at IJCAI 2022 |
Editors | David Bossens, Stephen Giguere, Roderick Bloem, Bettina Koenighofer |
Number of pages | 4 |
Publication status | Published - 2022 |
Event | International Workshop on Safe Reinforcement Learning - Vienna, Austria Duration: 23 Jul 2022 → 23 Jul 2022 Conference number: 1 |
Workshop
Workshop | International Workshop on Safe Reinforcement Learning |
---|---|
Abbreviated title | Safe RL workshop |
Country/Territory | Austria |
City | Vienna |
Period | 23/07/22 → 23/07/22 |