CEM: Constrained Entropy Maximization for Task-Agnostic Safe Exploration

Q. Yang; M.T.J. Spaan

CEM: Constrained Entropy Maximization for Task-Agnostic Safe Exploration

^*Corresponding author for this work

Algorithmics

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

22 Downloads (Pure)

Abstract

Without an assigned task, a suitable intrinsic objective for an agent is to explore the environment efficiently. However, the pursuit of exploration will inevitably bring more safety risks.
An under-explored aspect of reinforcement learning is how to achieve safe efficient exploration when the task is unknown.
In this paper, we propose a practical Constrained Entropy Maximization (CEM) algorithm to solve task-agnostic safe exploration problems, which naturally require a finite horizon and undiscounted constraints on safety costs.
The CEM algorithm aims to learn a policy that maximizes the state entropy under the premise of safety.
To avoid approximating the state density in complex domains, CEM leverages a $k$-nearest neighbor entropy estimator to evaluate the efficiency of exploration.
In terms of safety, CEM minimizes the safety costs, and adaptively trades off safety and exploration based on the current constraint satisfaction. We empirically show that CEM allows learning a safe exploration policy in complex continuous-control domains, and the learned policy benefits downstream tasks in safety and sample efficiency.

Original language	English
Title of host publication	The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)
Number of pages	9
Publication status	Published - 2023
Event	37th AAAI Conference on Artificial Intelligence - Washington, United States Duration: 7 Feb 2023 → 14 Feb 2023 Conference number: 37

Conference

Conference	37th AAAI Conference on Artificial Intelligence
Abbreviated title	AAAI-23
Country/Territory	United States
City	Washington
Period	7/02/23 → 14/02/23

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

Reinforcement Learning
Safe Exploration

Access to Document

26281-Article Text-30344-1-2-20230626Final published version, 5.25 MB

Cite this

@inproceedings{e473c39be97744febf01d94292f713db,

title = "CEM: Constrained Entropy Maximization for Task-Agnostic Safe Exploration",

abstract = "Without an assigned task, a suitable intrinsic objective for an agent is to explore the environment efficiently. However, the pursuit of exploration will inevitably bring more safety risks.An under-explored aspect of reinforcement learning is how to achieve safe efficient exploration when the task is unknown.In this paper, we propose a practical Constrained Entropy Maximization (CEM) algorithm to solve task-agnostic safe exploration problems, which naturally require a finite horizon and undiscounted constraints on safety costs.The CEM algorithm aims to learn a policy that maximizes the state entropy under the premise of safety.To avoid approximating the state density in complex domains, CEM leverages a $k$-nearest neighbor entropy estimator to evaluate the efficiency of exploration.In terms of safety, CEM minimizes the safety costs, and adaptively trades off safety and exploration based on the current constraint satisfaction. We empirically show that CEM allows learning a safe exploration policy in complex continuous-control domains, and the learned policy benefits downstream tasks in safety and sample efficiency.",

keywords = "Reinforcement Learning, Safe Exploration",

author = "Q. Yang and M.T.J. Spaan",

note = "Green Open Access added to TU Delft Institutional Repository {\textquoteleft}You share, we take care!{\textquoteright} – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. ; 37th AAAI Conference on Artificial Intelligence, AAAI-23 ; Conference date: 07-02-2023 Through 14-02-2023",

year = "2023",

language = "English",

booktitle = "The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)",

}

TY - GEN

T1 - CEM: Constrained Entropy Maximization for Task-Agnostic Safe Exploration

AU - Yang, Q.

AU - Spaan, M.T.J.

N1 - Conference code: 37

PY - 2023

Y1 - 2023

N2 - Without an assigned task, a suitable intrinsic objective for an agent is to explore the environment efficiently. However, the pursuit of exploration will inevitably bring more safety risks.An under-explored aspect of reinforcement learning is how to achieve safe efficient exploration when the task is unknown.In this paper, we propose a practical Constrained Entropy Maximization (CEM) algorithm to solve task-agnostic safe exploration problems, which naturally require a finite horizon and undiscounted constraints on safety costs.The CEM algorithm aims to learn a policy that maximizes the state entropy under the premise of safety.To avoid approximating the state density in complex domains, CEM leverages a $k$-nearest neighbor entropy estimator to evaluate the efficiency of exploration.In terms of safety, CEM minimizes the safety costs, and adaptively trades off safety and exploration based on the current constraint satisfaction. We empirically show that CEM allows learning a safe exploration policy in complex continuous-control domains, and the learned policy benefits downstream tasks in safety and sample efficiency.

AB - Without an assigned task, a suitable intrinsic objective for an agent is to explore the environment efficiently. However, the pursuit of exploration will inevitably bring more safety risks.An under-explored aspect of reinforcement learning is how to achieve safe efficient exploration when the task is unknown.In this paper, we propose a practical Constrained Entropy Maximization (CEM) algorithm to solve task-agnostic safe exploration problems, which naturally require a finite horizon and undiscounted constraints on safety costs.The CEM algorithm aims to learn a policy that maximizes the state entropy under the premise of safety.To avoid approximating the state density in complex domains, CEM leverages a $k$-nearest neighbor entropy estimator to evaluate the efficiency of exploration.In terms of safety, CEM minimizes the safety costs, and adaptively trades off safety and exploration based on the current constraint satisfaction. We empirically show that CEM allows learning a safe exploration policy in complex continuous-control domains, and the learned policy benefits downstream tasks in safety and sample efficiency.

KW - Reinforcement Learning

KW - Safe Exploration

M3 - Conference contribution

BT - The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)

T2 - 37th AAAI Conference on Artificial Intelligence

Y2 - 7 February 2023 through 14 February 2023

ER -

CEM: Constrained Entropy Maximization for Task-Agnostic Safe Exploration

Abstract

Conference

Bibliographical note

Keywords

Access to Document

Fingerprint

Cite this