TY - JOUR
T1 - Model-plant mismatch compensation using reinforcement learning
AU - Koryakovskiy, Ivan
AU - Kudruss, Manuel
AU - Vallery, Heike
AU - Babuska, Robert
AU - Caarls, Wouter
N1 - Accepted Author Manuscript
PY - 2018
Y1 - 2018
N2 - Learning-based approaches are suitable for the control of systems with unknown dynamics. However, learning from scratch involves many trials with exploratory actions until a good control policy is discovered. Real robots usually cannot withstand the exploratory actions and suffer damage. This problem can be circumvented by combining learning with a model-based control. In this letter, we employ a nominal model-predictive controller that is impeded by the presence of an unknown model-plant mismatch. To compensate for the mismatch, we propose two approaches of combining reinforcement learning with the nominal controller. The first approach learns a compensatory control action that minimizes the same performance measure as is minimized by the nominal controller. The second approach learns a compensatory signal from a difference of a transition predicted by the internal model and an actual transition. We compare the approaches on a robot attached to the ground and performing a setpoint reaching task in simulations. We implement the better approach on the real robot and demonstrate successful learning results.
AB - Learning-based approaches are suitable for the control of systems with unknown dynamics. However, learning from scratch involves many trials with exploratory actions until a good control policy is discovered. Real robots usually cannot withstand the exploratory actions and suffer damage. This problem can be circumvented by combining learning with a model-based control. In this letter, we employ a nominal model-predictive controller that is impeded by the presence of an unknown model-plant mismatch. To compensate for the mismatch, we propose two approaches of combining reinforcement learning with the nominal controller. The first approach learns a compensatory control action that minimizes the same performance measure as is minimized by the nominal controller. The second approach learns a compensatory signal from a difference of a transition predicted by the internal model and an actual transition. We compare the approaches on a robot attached to the ground and performing a setpoint reaching task in simulations. We implement the better approach on the real robot and demonstrate successful learning results.
UR - http://resolver.tudelft.nl/uuid:4ea9a42f-9d8a-4080-936e-d603feaac5ab
U2 - 10.1109/LRA.2018.2800106
DO - 10.1109/LRA.2018.2800106
M3 - Article
SN - 2377-3766
VL - 3
SP - 2471
EP - 2477
JO - IEEE Robotics and Automation Letters
JF - IEEE Robotics and Automation Letters
IS - 3
ER -