WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

Q. Yang; T. D. Simão; S.H. Tindemans; M.T.J. Spaan

WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

Q. Yang, T. D. Simão, S.H. Tindemans, M.T.J. Spaan

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

71 Downloads (Pure)

Abstract

Safe exploration is regarded as a key priority area for reinforcement learning research. With separate reward and safety signals, it is natural to cast it as constrained reinforcement learning, where expected long-term costs of policies are constrained. However, it can be hazardous to set constraints on the expected safety signal without considering the tail of the distribution. For instance, in safety-critical domains, worst-case analysis is required to avoid disastrous results. We present a novel reinforcement learning algorithm called Worst-Case Soft Actor Critic, which extends the Soft Actor Critic algorithm with a safety critic to achieve risk control. More specifically, a certain level of conditional Value-at- Risk from the distribution is regarded as a safety measure to judge the constraint satisfaction, which guides the change of adaptive safety weights to achieve a trade-off between reward and safety. As a result, we can optimize policies under the premise that their worst-case performance satisfies the constraints. The empirical analysis shows that our algorithm attains better risk control compared to expectation-based methods.

Original language	English
Title of host publication	Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI-21)
Pages	10639-10646
Number of pages	8
Publication status	Published - 2021

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care

Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

Reinforcement Learning

Access to Document

17272_Article_Text_20766_1_2_20210518Final published version, 3.3 MB

https://ojs.aaai.org/index.php/AAAI/article/view/17272

Cite this

@inproceedings{8504e31160f14fc796e1921884e0900c,

title = "WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning",

abstract = "Safe exploration is regarded as a key priority area for reinforcement learning research. With separate reward and safety signals, it is natural to cast it as constrained reinforcement learning, where expected long-term costs of policies are constrained. However, it can be hazardous to set constraints on the expected safety signal without considering the tail of the distribution. For instance, in safety-critical domains, worst-case analysis is required to avoid disastrous results. We present a novel reinforcement learning algorithm called Worst-Case Soft Actor Critic, which extends the Soft Actor Critic algorithm with a safety critic to achieve risk control. More specifically, a certain level of conditional Value-at- Risk from the distribution is regarded as a safety measure to judge the constraint satisfaction, which guides the change of adaptive safety weights to achieve a trade-off between reward and safety. As a result, we can optimize policies under the premise that their worst-case performance satisfies the constraints. The empirical analysis shows that our algorithm attains better risk control compared to expectation-based methods. ",

keywords = "Reinforcement Learning",

author = "Q. Yang and Sim{\~a}o, {T. D.} and S.H. Tindemans and M.T.J. Spaan",

note = "Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.",

year = "2021",

language = "English",

pages = "10639--10646",

booktitle = "Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI-21)",

}

TY - GEN

T1 - WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

AU - Yang, Q.

AU - Simão, T. D.

AU - Tindemans, S.H.

AU - Spaan, M.T.J.

N1 - Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2021

Y1 - 2021

N2 - Safe exploration is regarded as a key priority area for reinforcement learning research. With separate reward and safety signals, it is natural to cast it as constrained reinforcement learning, where expected long-term costs of policies are constrained. However, it can be hazardous to set constraints on the expected safety signal without considering the tail of the distribution. For instance, in safety-critical domains, worst-case analysis is required to avoid disastrous results. We present a novel reinforcement learning algorithm called Worst-Case Soft Actor Critic, which extends the Soft Actor Critic algorithm with a safety critic to achieve risk control. More specifically, a certain level of conditional Value-at- Risk from the distribution is regarded as a safety measure to judge the constraint satisfaction, which guides the change of adaptive safety weights to achieve a trade-off between reward and safety. As a result, we can optimize policies under the premise that their worst-case performance satisfies the constraints. The empirical analysis shows that our algorithm attains better risk control compared to expectation-based methods.

AB - Safe exploration is regarded as a key priority area for reinforcement learning research. With separate reward and safety signals, it is natural to cast it as constrained reinforcement learning, where expected long-term costs of policies are constrained. However, it can be hazardous to set constraints on the expected safety signal without considering the tail of the distribution. For instance, in safety-critical domains, worst-case analysis is required to avoid disastrous results. We present a novel reinforcement learning algorithm called Worst-Case Soft Actor Critic, which extends the Soft Actor Critic algorithm with a safety critic to achieve risk control. More specifically, a certain level of conditional Value-at- Risk from the distribution is regarded as a safety measure to judge the constraint satisfaction, which guides the change of adaptive safety weights to achieve a trade-off between reward and safety. As a result, we can optimize policies under the premise that their worst-case performance satisfies the constraints. The empirical analysis shows that our algorithm attains better risk control compared to expectation-based methods.

KW - Reinforcement Learning

M3 - Conference contribution

SP - 10639

EP - 10646

BT - Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI-21)

ER -

WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

Abstract

Bibliographical note

Keywords

Access to Document

Fingerprint

Cite this