Abstract
Safety is critical to broadening the a lication of reinforcement learning (RL). Often, RL agents are trained in a controlled environment, such as a laboratory, before being de loyed in the real world. However, the target reward might be unknown rior to de loyment. Reward-free RL addresses this roblem by training an agent without the reward to ada t quickly once the reward is revealed.
We consider the constrained reward-free setting, where an agent (the guide) learns to ex lore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still rovides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to com ose a safe sam ling olicy. Drawing from transfer learning, we also regularize a target olicy (the student)
towards the guide while the student is unreliable and gradually eliminate the influence from the guide as training rogresses. The em irical analysis shows that this method can achieve safe transfer learning and hel s the student solve the target task faster.
We consider the constrained reward-free setting, where an agent (the guide) learns to ex lore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still rovides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to com ose a safe sam ling olicy. Drawing from transfer learning, we also regularize a target olicy (the student)
towards the guide while the student is unreliable and gradually eliminate the influence from the guide as training rogresses. The em irical analysis shows that this method can achieve safe transfer learning and hel s the student solve the target task faster.
Original language | English |
---|---|
Title of host publication | Proceedings of the Adaptive and Learning Agents Workshop |
Editors | Hayes Cruz , Santos da Silva |
Number of pages | 14 |
Publication status | Published - 2022 |
Event | Adaptive and Learning Agents Workshop at AAMAS 2022 - Duration: 9 May 2022 → 10 Jul 2022 |
Workshop
Workshop | Adaptive and Learning Agents Workshop at AAMAS 2022 |
---|---|
Abbreviated title | ALA 2022 |
Period | 9/05/22 → 10/07/22 |