Risk Aversion and Guided Exploration in Safety-Constrained Reinforcement Learning

Q. Yang

doi:10.4233/uuid:ca5a81c2-f895-4638-bce5-1423a5943381

Risk Aversion and Guided Exploration in Safety-Constrained Reinforcement Learning

Q. Yang

Algorithmics

Research output: Thesis › Dissertation (TU Delft)

85 Downloads (Pure)

Abstract

In traditional reinforcement learning (RL) problems, agents can explore environments to learn optimal policies through trials and errors that are sometimes unsafe. However, unsafe interactions with environments are unacceptable in many safety-critical problems, for instance in robot navigation tasks. Even though RL agents can be trained in simulators, there are many real-world problems without simulators of sufficient fidelity. Constructing safe exploration algorithms for dangerous environments is challenging because we have to optimize policies under the premise of safety. In general, safety is still an open problem that hinders the wider application of RL.

Original language	English
Qualification	Doctor of Philosophy
Awarding Institution	Delft University of Technology
Supervisors/Advisors	Spaan, M.T.J., Supervisor Tindemans, Simon H., Advisor
Award date	23 Jun 2023
Electronic ISBNs	978-94-6384-458-1
DOIs	https://doi.org/10.4233/uuid:ca5a81c2-f895-4638-bce5-1423a5943381
Publication status	Published - 2023

Keywords

Reinforcement Leaning (RL)
constrained optimization
quantile regression
taskagnostic exploration

Access to Document

10.4233/uuid:ca5a81c2-f895-4638-bce5-1423a5943381

Dissertation_QisongYang (1)Final published version, 28 MB
PhD_propositions_qisongOther version, 30.1 KBLicence: Unspecified

Cite this

@phdthesis{ca5a81c2f8954638bce51423a5943381,

title = "Risk Aversion and Guided Exploration in Safety-Constrained Reinforcement Learning",

abstract = "In traditional reinforcement learning (RL) problems, agents can explore environments to learn optimal policies through trials and errors that are sometimes unsafe. However, unsafe interactions with environments are unacceptable in many safety-critical problems, for instance in robot navigation tasks. Even though RL agents can be trained in simulators, there are many real-world problems without simulators of sufficient fidelity. Constructing safe exploration algorithms for dangerous environments is challenging because we have to optimize policies under the premise of safety. In general, safety is still an open problem that hinders the wider application of RL.",

keywords = "Reinforcement Leaning (RL), constrained optimization, quantile regression, taskagnostic exploration",

author = "Q. Yang",

year = "2023",

doi = "10.4233/uuid:ca5a81c2-f895-4638-bce5-1423a5943381",

language = "English",

type = "Dissertation (TU Delft)",

school = "Delft University of Technology",

}

TY - THES

T1 - Risk Aversion and Guided Exploration in Safety-Constrained Reinforcement Learning

AU - Yang, Q.

PY - 2023

Y1 - 2023

N2 - In traditional reinforcement learning (RL) problems, agents can explore environments to learn optimal policies through trials and errors that are sometimes unsafe. However, unsafe interactions with environments are unacceptable in many safety-critical problems, for instance in robot navigation tasks. Even though RL agents can be trained in simulators, there are many real-world problems without simulators of sufficient fidelity. Constructing safe exploration algorithms for dangerous environments is challenging because we have to optimize policies under the premise of safety. In general, safety is still an open problem that hinders the wider application of RL.

AB - In traditional reinforcement learning (RL) problems, agents can explore environments to learn optimal policies through trials and errors that are sometimes unsafe. However, unsafe interactions with environments are unacceptable in many safety-critical problems, for instance in robot navigation tasks. Even though RL agents can be trained in simulators, there are many real-world problems without simulators of sufficient fidelity. Constructing safe exploration algorithms for dangerous environments is challenging because we have to optimize policies under the premise of safety. In general, safety is still an open problem that hinders the wider application of RL.

KW - Reinforcement Leaning (RL)

KW - constrained optimization

KW - quantile regression

KW - taskagnostic exploration

U2 - 10.4233/uuid:ca5a81c2-f895-4638-bce5-1423a5943381

DO - 10.4233/uuid:ca5a81c2-f895-4638-bce5-1423a5943381

M3 - Dissertation (TU Delft)

ER -

Risk Aversion and Guided Exploration in Safety-Constrained Reinforcement Learning

Abstract

Keywords

Access to Document

Fingerprint

Cite this