Model+Learning-based Optimal Control: An Inverted Pendulum Study

Simone Baldi; Muhammad Ridho Rosa; Yuzhang Wang

doi:10.1109/ICCA51439.2020.9264402

Model+Learning-based Optimal Control: An Inverted Pendulum Study

Simone Baldi, Muhammad Ridho Rosa, Yuzhang Wang

Team Bart De Schutter

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

1 Citation (Scopus)

96 Downloads (Pure)

Abstract

This work extends and compares some recent model+learning-based methodologies for optimal control with input saturation. We focus on two methodologies: a model-based actor-critic (MBAC) strategy, and a nonlinear policy iteration strategy. To evaluate the performance of the algorithms, these strategies are applied to the swinging up an inverted pendulum. Numerical simulations show that the neural network approximation in the MBAC strategy can be poor, and the algorithm may converge far from the optimum. In the MBAC approach neither stabilization nor monotonic convergence can be guaranteed, and it is observed that the best value function is not always corresponding to the last one. On the other side the nonlinear policy iteration approach guarantees that every new control policy is stabilizing and generally leads to a monotonically decreasing cost.

Original language	English
Title of host publication	Proceedings of the IEEE 16th International Conference on Control and Automation, ICCA 2020
Place of Publication	Piscataway, NJ, USA
Publisher	IEEE
Pages	773-778
ISBN (Electronic)	978-1-7281-9093-8
DOIs	https://doi.org/10.1109/ICCA51439.2020.9264402
Publication status	Published - 2020
Event	16th IEEE International Conference on Control and Automation, ICCA 2020 - Virtual, Sapporo, Hokkaido, Japan Duration: 9 Oct 2020 → 11 Oct 2020

Conference

Conference	16th IEEE International Conference on Control and Automation, ICCA 2020
Country/Territory	Japan
City	Virtual, Sapporo, Hokkaido
Period	9/10/20 → 11/10/20

Bibliographical note

Accepted Author Manuscript

Access to Document

10.1109/ICCA51439.2020.9264402

root_swing3_shortAccepted author manuscript, 853 KB

Cite this

@inproceedings{552ddb83fa0d4d9f943fb50c4b27b027,

title = "Model+Learning-based Optimal Control: An Inverted Pendulum Study",

abstract = "This work extends and compares some recent model+learning-based methodologies for optimal control with input saturation. We focus on two methodologies: a model-based actor-critic (MBAC) strategy, and a nonlinear policy iteration strategy. To evaluate the performance of the algorithms, these strategies are applied to the swinging up an inverted pendulum. Numerical simulations show that the neural network approximation in the MBAC strategy can be poor, and the algorithm may converge far from the optimum. In the MBAC approach neither stabilization nor monotonic convergence can be guaranteed, and it is observed that the best value function is not always corresponding to the last one. On the other side the nonlinear policy iteration approach guarantees that every new control policy is stabilizing and generally leads to a monotonically decreasing cost.",

author = "Simone Baldi and Rosa, {Muhammad Ridho} and Yuzhang Wang",

note = "Accepted Author Manuscript; 16th IEEE International Conference on Control and Automation, ICCA 2020 ; Conference date: 09-10-2020 Through 11-10-2020",

year = "2020",

doi = "10.1109/ICCA51439.2020.9264402",

language = "English",

pages = "773--778",

booktitle = "Proceedings of the IEEE 16th International Conference on Control and Automation, ICCA 2020",

publisher = "IEEE",

address = "United States",

}

Baldi, S, Rosa, MR & Wang, Y 2020, Model+Learning-based Optimal Control: An Inverted Pendulum Study. in Proceedings of the IEEE 16th International Conference on Control and Automation, ICCA 2020. IEEE, Piscataway, NJ, USA, pp. 773-778, 16th IEEE International Conference on Control and Automation, ICCA 2020, Virtual, Sapporo, Hokkaido, Japan, 9/10/20. https://doi.org/10.1109/ICCA51439.2020.9264402

Model+Learning-based Optimal Control: An Inverted Pendulum Study. / Baldi, Simone; Rosa, Muhammad Ridho; Wang, Yuzhang.
Proceedings of the IEEE 16th International Conference on Control and Automation, ICCA 2020. Piscataway, NJ, USA: IEEE, 2020. p. 773-778.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Model+Learning-based Optimal Control

T2 - 16th IEEE International Conference on Control and Automation, ICCA 2020

AU - Baldi, Simone

AU - Rosa, Muhammad Ridho

AU - Wang, Yuzhang

N1 - Accepted Author Manuscript

PY - 2020

Y1 - 2020

N2 - This work extends and compares some recent model+learning-based methodologies for optimal control with input saturation. We focus on two methodologies: a model-based actor-critic (MBAC) strategy, and a nonlinear policy iteration strategy. To evaluate the performance of the algorithms, these strategies are applied to the swinging up an inverted pendulum. Numerical simulations show that the neural network approximation in the MBAC strategy can be poor, and the algorithm may converge far from the optimum. In the MBAC approach neither stabilization nor monotonic convergence can be guaranteed, and it is observed that the best value function is not always corresponding to the last one. On the other side the nonlinear policy iteration approach guarantees that every new control policy is stabilizing and generally leads to a monotonically decreasing cost.

AB - This work extends and compares some recent model+learning-based methodologies for optimal control with input saturation. We focus on two methodologies: a model-based actor-critic (MBAC) strategy, and a nonlinear policy iteration strategy. To evaluate the performance of the algorithms, these strategies are applied to the swinging up an inverted pendulum. Numerical simulations show that the neural network approximation in the MBAC strategy can be poor, and the algorithm may converge far from the optimum. In the MBAC approach neither stabilization nor monotonic convergence can be guaranteed, and it is observed that the best value function is not always corresponding to the last one. On the other side the nonlinear policy iteration approach guarantees that every new control policy is stabilizing and generally leads to a monotonically decreasing cost.

UR - http://www.scopus.com/inward/record.url?scp=85098069597&partnerID=8YFLogxK

U2 - 10.1109/ICCA51439.2020.9264402

DO - 10.1109/ICCA51439.2020.9264402

M3 - Conference contribution

AN - SCOPUS:85098069597

SP - 773

EP - 778

BT - Proceedings of the IEEE 16th International Conference on Control and Automation, ICCA 2020

PB - IEEE

CY - Piscataway, NJ, USA

Y2 - 9 October 2020 through 11 October 2020

ER -

Model+Learning-based Optimal Control: An Inverted Pendulum Study

Abstract

Conference

Bibliographical note

Access to Document

Other files and links

Fingerprint

Cite this