Model-Based Safe Reinforcement Learning With Time-Varying Constraints: Applications to Intelligent Vehicles

Xinglong Zhang; Yaoqian Peng; Biao Luo; Wei Pan; Xin Xu; Haibin Xie

doi:10.1109/TIE.2023.3317853

Model-Based Safe Reinforcement Learning With Time-Varying Constraints: Applications to Intelligent Vehicles

Xinglong Zhang, Yaoqian Peng, Biao Luo, Wei Pan, Xin Xu, Haibin Xie

Robot Dynamics

Research output: Contribution to journal › Article › Scientific › peer-review

Abstract

In recent years, safe reinforcement learning (RL) with the actor-critic structure has gained significant interest for continuous control tasks. However, achieving near-optimal control policies with safety and convergence guarantees remains challenging. Moreover, few works have focused on designing RL algorithms that handle time-varying safety constraints. This article proposes a safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. The algorithm's novelty lies in two key aspects. Firstly, the approach introduces a unique barrier force-based control policy structure to ensure control safety during learning. Secondly, a multistep policy evaluation mechanism is employed, enabling the prediction of policy safety risks under time-varying constraints and guiding safe updates. Theoretical results on learning convergence, stability, and robustness are proven. The proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment. It is also applied to the real-world problem of integrated path following and collision avoidance for two intelligent vehicles—a differential-drive vehicle and an Ackermann-drive one. The experimental results demonstrate the impressive sim-to-real transfer capability of our approach, while showcasing satisfactory online control performance.

Original language	English
Pages (from-to)	1-10
Number of pages	10
Journal	IEEE Transactions on Industrial Electronics
DOIs	https://doi.org/10.1109/TIE.2023.3317853
Publication status	Accepted/In press - 4 Jan 2024

Keywords

Barrier force
Convergence
Heuristic algorithms
multistep policy evaluation
Optimal control
Reinforcement learning
safe reinforcement learning (RL)
Safety
time-varying constraints
Time-varying systems
Vehicle dynamics

Access to Document

10.1109/TIE.2023.3317853

Cite this

@article{7a5f0f40919e4533936f10d5e7e9689b,

title = "Model-Based Safe Reinforcement Learning With Time-Varying Constraints: Applications to Intelligent Vehicles",

abstract = "In recent years, safe reinforcement learning (RL) with the actor-critic structure has gained significant interest for continuous control tasks. However, achieving near-optimal control policies with safety and convergence guarantees remains challenging. Moreover, few works have focused on designing RL algorithms that handle time-varying safety constraints. This article proposes a safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. The algorithm's novelty lies in two key aspects. Firstly, the approach introduces a unique barrier force-based control policy structure to ensure control safety during learning. Secondly, a multistep policy evaluation mechanism is employed, enabling the prediction of policy safety risks under time-varying constraints and guiding safe updates. Theoretical results on learning convergence, stability, and robustness are proven. The proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment. It is also applied to the real-world problem of integrated path following and collision avoidance for two intelligent vehicles—a differential-drive vehicle and an Ackermann-drive one. The experimental results demonstrate the impressive sim-to-real transfer capability of our approach, while showcasing satisfactory online control performance.",

keywords = "Barrier force, Convergence, Heuristic algorithms, multistep policy evaluation, Optimal control, Reinforcement learning, safe reinforcement learning (RL), Safety, time-varying constraints, Time-varying systems, Vehicle dynamics",

author = "Xinglong Zhang and Yaoqian Peng and Biao Luo and Wei Pan and Xin Xu and Haibin Xie",

year = "2024",

month = jan,

day = "4",

doi = "10.1109/TIE.2023.3317853",

language = "English",

pages = "1--10",

journal = "IEEE Transactions on Industrial Electronics",

issn = "0278-0046",

publisher = "IEEE Industrial Electronics Society",

}

TY - JOUR

T1 - Model-Based Safe Reinforcement Learning With Time-Varying Constraints

T2 - Applications to Intelligent Vehicles

AU - Zhang, Xinglong

AU - Peng, Yaoqian

AU - Luo, Biao

AU - Pan, Wei

AU - Xu, Xin

AU - Xie, Haibin

PY - 2024/1/4

Y1 - 2024/1/4

N2 - In recent years, safe reinforcement learning (RL) with the actor-critic structure has gained significant interest for continuous control tasks. However, achieving near-optimal control policies with safety and convergence guarantees remains challenging. Moreover, few works have focused on designing RL algorithms that handle time-varying safety constraints. This article proposes a safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. The algorithm's novelty lies in two key aspects. Firstly, the approach introduces a unique barrier force-based control policy structure to ensure control safety during learning. Secondly, a multistep policy evaluation mechanism is employed, enabling the prediction of policy safety risks under time-varying constraints and guiding safe updates. Theoretical results on learning convergence, stability, and robustness are proven. The proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment. It is also applied to the real-world problem of integrated path following and collision avoidance for two intelligent vehicles—a differential-drive vehicle and an Ackermann-drive one. The experimental results demonstrate the impressive sim-to-real transfer capability of our approach, while showcasing satisfactory online control performance.

AB - In recent years, safe reinforcement learning (RL) with the actor-critic structure has gained significant interest for continuous control tasks. However, achieving near-optimal control policies with safety and convergence guarantees remains challenging. Moreover, few works have focused on designing RL algorithms that handle time-varying safety constraints. This article proposes a safe RL algorithm for optimal control of nonlinear systems with time-varying state and control constraints. The algorithm's novelty lies in two key aspects. Firstly, the approach introduces a unique barrier force-based control policy structure to ensure control safety during learning. Secondly, a multistep policy evaluation mechanism is employed, enabling the prediction of policy safety risks under time-varying constraints and guiding safe updates. Theoretical results on learning convergence, stability, and robustness are proven. The proposed algorithm outperforms several state-of-the-art RL algorithms in the simulated Safety Gym environment. It is also applied to the real-world problem of integrated path following and collision avoidance for two intelligent vehicles—a differential-drive vehicle and an Ackermann-drive one. The experimental results demonstrate the impressive sim-to-real transfer capability of our approach, while showcasing satisfactory online control performance.

KW - Barrier force

KW - Convergence

KW - Heuristic algorithms

KW - multistep policy evaluation

KW - Optimal control

KW - Reinforcement learning

KW - safe reinforcement learning (RL)

KW - Safety

KW - time-varying constraints

KW - Time-varying systems

KW - Vehicle dynamics

UR - http://www.scopus.com/inward/record.url?scp=85182379572&partnerID=8YFLogxK

U2 - 10.1109/TIE.2023.3317853

DO - 10.1109/TIE.2023.3317853

M3 - Article

AN - SCOPUS:85182379572

SN - 0278-0046

SP - 1

EP - 10

JO - IEEE Transactions on Industrial Electronics

JF - IEEE Transactions on Industrial Electronics

ER -

Model-Based Safe Reinforcement Learning With Time-Varying Constraints: Applications to Intelligent Vehicles

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this