Training and Transferring Safe Policies in Reinforcement Learning

Q. Yang; T. D. Simão; Nils Jansen; Simon H. Tindemans; M.T.J. Spaan

Training and Transferring Safe Policies in Reinforcement Learning

Q. Yang^*, T. D. Simão^*, Nils Jansen, Simon H. Tindemans, M.T.J. Spaan

^*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

123 Downloads (Pure)

Abstract

Safety is critical to broadening the a lication of reinforcement learning (RL). Often, RL agents are trained in a controlled environment, such as a laboratory, before being de loyed in the real world. However, the target reward might be unknown rior to de loyment. Reward-free RL addresses this roblem by training an agent without the reward to ada t quickly once the reward is revealed.
We consider the constrained reward-free setting, where an agent (the guide) learns to ex lore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still rovides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to com ose a safe sam ling olicy. Drawing from transfer learning, we also regularize a target olicy (the student)
towards the guide while the student is unreliable and gradually eliminate the influence from the guide as training rogresses. The em irical analysis shows that this method can achieve safe transfer learning and hel s the student solve the target task faster.

Original language	English
Title of host publication	Proceedings of the Adaptive and Learning Agents Workshop
Editors	Hayes Cruz , Santos da Silva
Number of pages	14
Publication status	Published - 2022
Event	Adaptive and Learning Agents Workshop at AAMAS 2022 - Duration: 9 May 2022 → 10 Jul 2022

Workshop

Workshop	Adaptive and Learning Agents Workshop at AAMAS 2022
Abbreviated title	ALA 2022
Period	9/05/22 → 10/07/22

Access to Document

ALA2022_paper_34Final published version, 9.24 MB

Cite this

@inproceedings{ac7a01345b814019baaf351427397d3d,

title = "Training and Transferring Safe Policies in Reinforcement Learning",

abstract = "Safety is critical to broadening the a lication of reinforcement learning (RL). Often, RL agents are trained in a controlled environment, such as a laboratory, before being de loyed in the real world. However, the target reward might be unknown rior to de loyment. Reward-free RL addresses this roblem by training an agent without the reward to ada t quickly once the reward is revealed.We consider the constrained reward-free setting, where an agent (the guide) learns to ex lore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still rovides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to com ose a safe sam ling olicy. Drawing from transfer learning, we also regularize a target olicy (the student)towards the guide while the student is unreliable and gradually eliminate the influence from the guide as training rogresses. The em irical analysis shows that this method can achieve safe transfer learning and hel s the student solve the target task faster.",

author = "Q. Yang and Sim{\~a}o, {T. D.} and Nils Jansen and Tindemans, {Simon H.} and M.T.J. Spaan",

year = "2022",

language = "English",

editor = "{Cruz }, Hayes and {da Silva}, Santos",

booktitle = "Proceedings of the Adaptive and Learning Agents Workshop",

note = "Adaptive and Learning Agents Workshop at AAMAS 2022, ALA 2022 ; Conference date: 09-05-2022 Through 10-07-2022",

}

TY - GEN

T1 - Training and Transferring Safe Policies in Reinforcement Learning

AU - Yang, Q.

AU - Simão, T. D.

AU - Jansen, Nils

AU - Tindemans, Simon H.

AU - Spaan, M.T.J.

PY - 2022

Y1 - 2022

N2 - Safety is critical to broadening the a lication of reinforcement learning (RL). Often, RL agents are trained in a controlled environment, such as a laboratory, before being de loyed in the real world. However, the target reward might be unknown rior to de loyment. Reward-free RL addresses this roblem by training an agent without the reward to ada t quickly once the reward is revealed.We consider the constrained reward-free setting, where an agent (the guide) learns to ex lore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still rovides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to com ose a safe sam ling olicy. Drawing from transfer learning, we also regularize a target olicy (the student)towards the guide while the student is unreliable and gradually eliminate the influence from the guide as training rogresses. The em irical analysis shows that this method can achieve safe transfer learning and hel s the student solve the target task faster.

AB - Safety is critical to broadening the a lication of reinforcement learning (RL). Often, RL agents are trained in a controlled environment, such as a laboratory, before being de loyed in the real world. However, the target reward might be unknown rior to de loyment. Reward-free RL addresses this roblem by training an agent without the reward to ada t quickly once the reward is revealed.We consider the constrained reward-free setting, where an agent (the guide) learns to ex lore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still rovides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to com ose a safe sam ling olicy. Drawing from transfer learning, we also regularize a target olicy (the student)towards the guide while the student is unreliable and gradually eliminate the influence from the guide as training rogresses. The em irical analysis shows that this method can achieve safe transfer learning and hel s the student solve the target task faster.

UR - https://ala2022.github.io/papers/ALA2022_paper_34.pdf

M3 - Conference contribution

BT - Proceedings of the Adaptive and Learning Agents Workshop

A2 - Cruz , Hayes

A2 - da Silva, Santos

T2 - Adaptive and Learning Agents Workshop at AAMAS 2022

Y2 - 9 May 2022 through 10 July 2022

ER -

Training and Transferring Safe Policies in Reinforcement Learning

Abstract

Workshop

Access to Document

Other files and links

Fingerprint

Cite this