TY - JOUR
T1 - AC4MPC
T2 - Actor-Critic Reinforcement Learning for Guiding Model Predictive Control
AU - Reiter, Rudolf
AU - Ghezzi, Andrea
AU - Baumgartner, Katrin
AU - Hoffmann, Jasper
AU - McAllister, Robert D.
AU - Diehl, Moritz
PY - 2026
Y1 - 2026
N2 - Nonlinear model predictive control (MPC) and reinforcement learning (RL) are two powerful control strategies with complementary advantages. This work shows how actor-critic RL techniques can be leveraged to improve the performance of MPC. The RL critic is used as an approximation of the optimal value function, and an actor rollout provides an initial guess for the primal variables of the MPC. A parallel control architecture is proposed where each MPC instance is solved twice for different initial guesses. Besides the actor rollout initialization, a shifted initialization from the previous solution is used. The control actions from the lowest-cost trajectory are applied to the system at each time step. We provide some theoretical justification of the proposed algorithm by establishing that the discounted closed-loop cost is upper-bounded by the discounted closed-loop cost of the original RL actor plus an error term that depends on the (sub)optimality of the RL actor and the accuracy of the critic. These results do not require globally optimal solutions and indicate that larger horizons mitigate the effect of errors in the critic approximation. The proposed algorithm is intended for applications where standard methods to construct terminal costs or constraints for MPC are impractical. The approach is demonstrated in an illustrative toy example and an autonomous driving overtaking scenario.
AB - Nonlinear model predictive control (MPC) and reinforcement learning (RL) are two powerful control strategies with complementary advantages. This work shows how actor-critic RL techniques can be leveraged to improve the performance of MPC. The RL critic is used as an approximation of the optimal value function, and an actor rollout provides an initial guess for the primal variables of the MPC. A parallel control architecture is proposed where each MPC instance is solved twice for different initial guesses. Besides the actor rollout initialization, a shifted initialization from the previous solution is used. The control actions from the lowest-cost trajectory are applied to the system at each time step. We provide some theoretical justification of the proposed algorithm by establishing that the discounted closed-loop cost is upper-bounded by the discounted closed-loop cost of the original RL actor plus an error term that depends on the (sub)optimality of the RL actor and the accuracy of the critic. These results do not require globally optimal solutions and indicate that larger horizons mitigate the effect of errors in the critic approximation. The proposed algorithm is intended for applications where standard methods to construct terminal costs or constraints for MPC are impractical. The approach is demonstrated in an illustrative toy example and an autonomous driving overtaking scenario.
KW - Dynamic programming (DP)
KW - model predictive control (MPC)
KW - reinforcement learning (RL)
UR - http://www.scopus.com/inward/record.url?scp=105020446970&partnerID=8YFLogxK
U2 - 10.1109/TCST.2025.3620521
DO - 10.1109/TCST.2025.3620521
M3 - Article
AN - SCOPUS:105020446970
SN - 1063-6536
VL - 34
SP - 395
EP - 410
JO - IEEE Transactions on Control Systems Technology
JF - IEEE Transactions on Control Systems Technology
IS - 1
ER -