Reinforcement Learning by Guided Safe Exploration

Qisong Yang; Thiago D. Simão; Nils Jansen; Simon H. Tindemans; Matthijs T.J. Spaan

doi:10.3233/FAIA230598

Reinforcement Learning by Guided Safe Exploration

Qisong Yang, Thiago D. Simão, Nils Jansen, Simon H. Tindemans, Matthijs T.J. Spaan

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

27 Downloads (Pure)

Abstract

Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once the reward is revealed. We consider the constrained reward-free setting, where an agent (the guide) learns to explore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still provides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to compose a safe behaviour policy. Drawing from transfer learning, we also regularize a target policy (the student) towards the guide while the student is unreliable and gradually eliminate the influence of the guide as training progresses. The empirical analysis shows that this method can achieve safe transfer learning and helps the student solve the target task faster.

Original language	English
Title of host publication	ECAI 2023 - 26th European Conference on Artificial Intelligence, including 12th Conference on Prestigious Applications of Intelligent Systems, PAIS 2023 - Proceedings
Editors	Kobi Gal, Kobi Gal, Ann Nowe, Grzegorz J. Nalepa, Roy Fairstein, Roxana Radulescu
Pages	2858 - 2865
Number of pages	8
ISBN (Electronic)	9781643684369
DOIs	https://doi.org/10.3233/FAIA230598
Publication status	Published - 2023
Event	26th European Conference on Artificial Intelligence - Kraków, Poland Duration: 30 Sept 2023 → 4 Oct 2023 Conference number: 26

Publication series

Name	Frontiers in Artificial Intelligence and Applications
Volume	372
ISSN (Print)	0922-6389
ISSN (Electronic)	1879-8314

Conference

Conference	26th European Conference on Artificial Intelligence
Abbreviated title	ECAI 2023
Country/Territory	Poland
City	Kraków
Period	30/09/23 → 4/10/23

Access to Document

10.3233/FAIA230598Licence: CC BY-NC

FAIA-372-FAIA230598Final published version, 630 KBLicence: CC BY-NC

Cite this

Yang, Q., Simão, T. D., Jansen, N., Tindemans, S. H., & Spaan, M. T. J. (2023). Reinforcement Learning by Guided Safe Exploration. In K. Gal, K. Gal, A. Nowe, G. J. Nalepa, R. Fairstein, & R. Radulescu (Eds.), ECAI 2023 - 26th European Conference on Artificial Intelligence, including 12th Conference on Prestigious Applications of Intelligent Systems, PAIS 2023 - Proceedings (pp. 2858 - 2865). (Frontiers in Artificial Intelligence and Applications; Vol. 372). https://doi.org/10.3233/FAIA230598

Yang, Qisong ; Simão, Thiago D. ; Jansen, Nils et al. / Reinforcement Learning by Guided Safe Exploration. ECAI 2023 - 26th European Conference on Artificial Intelligence, including 12th Conference on Prestigious Applications of Intelligent Systems, PAIS 2023 - Proceedings. editor / Kobi Gal ; Kobi Gal ; Ann Nowe ; Grzegorz J. Nalepa ; Roy Fairstein ; Roxana Radulescu. 2023. pp. 2858 - 2865 (Frontiers in Artificial Intelligence and Applications).

@inproceedings{65bcb73ae4f64e359ab5975461fd2466,

title = "Reinforcement Learning by Guided Safe Exploration",

abstract = "Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once the reward is revealed. We consider the constrained reward-free setting, where an agent (the guide) learns to explore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still provides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to compose a safe behaviour policy. Drawing from transfer learning, we also regularize a target policy (the student) towards the guide while the student is unreliable and gradually eliminate the influence of the guide as training progresses. The empirical analysis shows that this method can achieve safe transfer learning and helps the student solve the target task faster.",

author = "Qisong Yang and Sim{\~a}o, {Thiago D.} and Nils Jansen and Tindemans, {Simon H.} and Spaan, {Matthijs T.J.}",

year = "2023",

doi = "10.3233/FAIA230598",

language = "English",

series = "Frontiers in Artificial Intelligence and Applications",

pages = "2858 -- 2865",

editor = "Kobi Gal and Kobi Gal and Ann Nowe and Nalepa, {Grzegorz J.} and Roy Fairstein and Roxana Radulescu",

booktitle = "ECAI 2023 - 26th European Conference on Artificial Intelligence, including 12th Conference on Prestigious Applications of Intelligent Systems, PAIS 2023 - Proceedings",

note = "26th European Conference on Artificial Intelligence, ECAI 2023 ; Conference date: 30-09-2023 Through 04-10-2023",

}

Yang, Q , Simão, TD, Jansen, N, Tindemans, SH & Spaan, MTJ 2023, Reinforcement Learning by Guided Safe Exploration. in K Gal, K Gal, A Nowe, GJ Nalepa, R Fairstein & R Radulescu (eds), ECAI 2023 - 26th European Conference on Artificial Intelligence, including 12th Conference on Prestigious Applications of Intelligent Systems, PAIS 2023 - Proceedings. Frontiers in Artificial Intelligence and Applications, vol. 372, pp. 2858 - 2865, 26th European Conference on Artificial Intelligence, Kraków, Poland, 30/09/23. https://doi.org/10.3233/FAIA230598

Reinforcement Learning by Guided Safe Exploration. / Yang, Qisong ; Simão, Thiago D.; Jansen, Nils et al.
ECAI 2023 - 26th European Conference on Artificial Intelligence, including 12th Conference on Prestigious Applications of Intelligent Systems, PAIS 2023 - Proceedings. ed. / Kobi Gal; Kobi Gal; Ann Nowe; Grzegorz J. Nalepa; Roy Fairstein; Roxana Radulescu. 2023. p. 2858 - 2865 (Frontiers in Artificial Intelligence and Applications; Vol. 372).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Reinforcement Learning by Guided Safe Exploration

AU - Yang, Qisong

AU - Simão, Thiago D.

AU - Jansen, Nils

AU - Tindemans, Simon H.

AU - Spaan, Matthijs T.J.

N1 - Conference code: 26

PY - 2023

Y1 - 2023

N2 - Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once the reward is revealed. We consider the constrained reward-free setting, where an agent (the guide) learns to explore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still provides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to compose a safe behaviour policy. Drawing from transfer learning, we also regularize a target policy (the student) towards the guide while the student is unreliable and gradually eliminate the influence of the guide as training progresses. The empirical analysis shows that this method can achieve safe transfer learning and helps the student solve the target task faster.

AB - Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once the reward is revealed. We consider the constrained reward-free setting, where an agent (the guide) learns to explore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still provides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to compose a safe behaviour policy. Drawing from transfer learning, we also regularize a target policy (the student) towards the guide while the student is unreliable and gradually eliminate the influence of the guide as training progresses. The empirical analysis shows that this method can achieve safe transfer learning and helps the student solve the target task faster.

UR - http://www.scopus.com/inward/record.url?scp=85175803702&partnerID=8YFLogxK

U2 - 10.3233/FAIA230598

DO - 10.3233/FAIA230598

M3 - Conference contribution

T3 - Frontiers in Artificial Intelligence and Applications

SP - 2858

EP - 2865

BT - ECAI 2023 - 26th European Conference on Artificial Intelligence, including 12th Conference on Prestigious Applications of Intelligent Systems, PAIS 2023 - Proceedings

A2 - Gal, Kobi

A2 - Nowe, Ann

A2 - Nalepa, Grzegorz J.

A2 - Fairstein, Roy

A2 - Radulescu, Roxana

T2 - 26th European Conference on Artificial Intelligence

Y2 - 30 September 2023 through 4 October 2023

ER -

Yang Q , Simão TD, Jansen N, Tindemans SH , Spaan MTJ. Reinforcement Learning by Guided Safe Exploration. In Gal K, Gal K, Nowe A, Nalepa GJ, Fairstein R, Radulescu R, editors, ECAI 2023 - 26th European Conference on Artificial Intelligence, including 12th Conference on Prestigious Applications of Intelligent Systems, PAIS 2023 - Proceedings. 2023. p. 2858 - 2865. (Frontiers in Artificial Intelligence and Applications). doi: 10.3233/FAIA230598

Reinforcement Learning by Guided Safe Exploration

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this