Learning Optimal Controllers for Linear Systems with Multiplicative Noise via Policy Gradient

Benjamin Gravell; Peyman Mohajerin Esfahani; Tyler H. Summers

doi:10.1109/TAC.2020.3037046

Learning Optimal Controllers for Linear Systems with Multiplicative Noise via Policy Gradient

Benjamin Gravell, Peyman Mohajerin Esfahani, Tyler H. Summers

Team Bart De Schutter

Research output: Contribution to journal › Article › Scientific › peer-review

19 Citations (Scopus)

37 Downloads (Pure)

Abstract

The linear quadratic regulator (LQR) problem has reemerged as an important theoretical benchmark for reinforcement learning-based control of complex dynamical systems with continuous state and action spaces. In contrast with nearly all recent work in this area, we consider multiplicative noise models, which are increasingly relevant because they explicitly incorporate inherent uncertainty and variation in the system dynamics and thereby improve robustness properties of the controller. Robustness is a critical and poorly understood issue in reinforcement learning; existing methods which do not account for uncertainty can converge to fragile policies or fail to converge at all. Additionally, intentional injection of multiplicative noise into learning algorithms can enhance robustness of policies, as observed in ad hoc work on domain randomization. Although policy gradient algorithms require optimization of a non-convex cost function, we show that the multiplicative noise LQR cost has a special property called gradient domination, which is exploited to prove global convergence of policy gradient algorithms to the globally optimum control policy with polynomial dependence on problem parameters. Results are provided both in the model-known and model-unknown settings where samples of system trajectories are used to estimate policy gradients.

Original language	English
Pages (from-to)	5283-5298
Journal	IEEE Transactions on Automatic Control
Volume	66
Issue number	11
DOIs	https://doi.org/10.1109/TAC.2020.3037046
Publication status	Published - 2021

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

Additive noise
Convergence
Covariance matrices
gradient methods
noise
optimal control
Reinforcement learning
Robustness
Stability analysis
Stochastic processes
stochastic systems
uncertain systems
Uncertainty

Access to Document

10.1109/TAC.2020.3037046

Learning_Optimal_Controllers_for_Linear_Systems_With_Multiplicative_Noise_via_Policy_GradientFinal published version, 1.08 MB

Cite this

@article{aecd5a3dd42944339fe313080316fd05,

title = "Learning Optimal Controllers for Linear Systems with Multiplicative Noise via Policy Gradient",

abstract = "The linear quadratic regulator (LQR) problem has reemerged as an important theoretical benchmark for reinforcement learning-based control of complex dynamical systems with continuous state and action spaces. In contrast with nearly all recent work in this area, we consider multiplicative noise models, which are increasingly relevant because they explicitly incorporate inherent uncertainty and variation in the system dynamics and thereby improve robustness properties of the controller. Robustness is a critical and poorly understood issue in reinforcement learning; existing methods which do not account for uncertainty can converge to fragile policies or fail to converge at all. Additionally, intentional injection of multiplicative noise into learning algorithms can enhance robustness of policies, as observed in ad hoc work on domain randomization. Although policy gradient algorithms require optimization of a non-convex cost function, we show that the multiplicative noise LQR cost has a special property called gradient domination, which is exploited to prove global convergence of policy gradient algorithms to the globally optimum control policy with polynomial dependence on problem parameters. Results are provided both in the model-known and model-unknown settings where samples of system trajectories are used to estimate policy gradients.",

keywords = "Additive noise, Convergence, Covariance matrices, gradient methods, noise, optimal control, Reinforcement learning, Robustness, Stability analysis, Stochastic processes, stochastic systems, uncertain systems, Uncertainty",

author = "Benjamin Gravell and {Mohajerin Esfahani}, Peyman and Summers, {Tyler H.}",

note = "Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.",

year = "2021",

doi = "10.1109/TAC.2020.3037046",

language = "English",

volume = "66",

pages = "5283--5298",

journal = "IEEE Transactions on Automatic Control",

issn = "0018-9286",

publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

number = "11",

}

TY - JOUR

T1 - Learning Optimal Controllers for Linear Systems with Multiplicative Noise via Policy Gradient

AU - Gravell, Benjamin

AU - Mohajerin Esfahani, Peyman

AU - Summers, Tyler H.

N1 - Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2021

Y1 - 2021

N2 - The linear quadratic regulator (LQR) problem has reemerged as an important theoretical benchmark for reinforcement learning-based control of complex dynamical systems with continuous state and action spaces. In contrast with nearly all recent work in this area, we consider multiplicative noise models, which are increasingly relevant because they explicitly incorporate inherent uncertainty and variation in the system dynamics and thereby improve robustness properties of the controller. Robustness is a critical and poorly understood issue in reinforcement learning; existing methods which do not account for uncertainty can converge to fragile policies or fail to converge at all. Additionally, intentional injection of multiplicative noise into learning algorithms can enhance robustness of policies, as observed in ad hoc work on domain randomization. Although policy gradient algorithms require optimization of a non-convex cost function, we show that the multiplicative noise LQR cost has a special property called gradient domination, which is exploited to prove global convergence of policy gradient algorithms to the globally optimum control policy with polynomial dependence on problem parameters. Results are provided both in the model-known and model-unknown settings where samples of system trajectories are used to estimate policy gradients.

AB - The linear quadratic regulator (LQR) problem has reemerged as an important theoretical benchmark for reinforcement learning-based control of complex dynamical systems with continuous state and action spaces. In contrast with nearly all recent work in this area, we consider multiplicative noise models, which are increasingly relevant because they explicitly incorporate inherent uncertainty and variation in the system dynamics and thereby improve robustness properties of the controller. Robustness is a critical and poorly understood issue in reinforcement learning; existing methods which do not account for uncertainty can converge to fragile policies or fail to converge at all. Additionally, intentional injection of multiplicative noise into learning algorithms can enhance robustness of policies, as observed in ad hoc work on domain randomization. Although policy gradient algorithms require optimization of a non-convex cost function, we show that the multiplicative noise LQR cost has a special property called gradient domination, which is exploited to prove global convergence of policy gradient algorithms to the globally optimum control policy with polynomial dependence on problem parameters. Results are provided both in the model-known and model-unknown settings where samples of system trajectories are used to estimate policy gradients.

KW - Additive noise

KW - Convergence

KW - Covariance matrices

KW - gradient methods

KW - noise

KW - optimal control

KW - Reinforcement learning

KW - Robustness

KW - Stability analysis

KW - Stochastic processes

KW - stochastic systems

KW - uncertain systems

KW - Uncertainty

UR - http://www.scopus.com/inward/record.url?scp=85098797957&partnerID=8YFLogxK

U2 - 10.1109/TAC.2020.3037046

DO - 10.1109/TAC.2020.3037046

M3 - Article

AN - SCOPUS:85098797957

SN - 0018-9286

VL - 66

SP - 5283

EP - 5298

JO - IEEE Transactions on Automatic Control

JF - IEEE Transactions on Automatic Control

IS - 11

ER -

Learning Optimal Controllers for Linear Systems with Multiplicative Noise via Policy Gradient

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this