TY - JOUR
T1 - A fast hybrid reinforcement learning framework with human corrective feedback
AU - Celemin, Carlos
AU - Ruiz-del-Solar, Javier
AU - Kober, Jens
PY - 2018
Y1 - 2018
N2 - Reinforcement Learning agents can be supported by feedback from human teachers in the learning loop that guides the learning process. In this work we propose two hybrid strategies of Policy Search Reinforcement Learning and Interactive Machine Learning that benefit from both sources of information, the cost function and the human corrective feedback, for accelerating the convergence and improving the final performance of the learning process. Experiments with simulated and real systems of balancing tasks and a 3 DoF robot arm validate the advantages of the proposed learning strategies: (i) they speed up the convergence of the learning process between 3 and 30 times, saving considerable time during the agent adaptation, and (ii) they allow including non-expert feedback because they have low sensibility to erroneous human advice.
AB - Reinforcement Learning agents can be supported by feedback from human teachers in the learning loop that guides the learning process. In this work we propose two hybrid strategies of Policy Search Reinforcement Learning and Interactive Machine Learning that benefit from both sources of information, the cost function and the human corrective feedback, for accelerating the convergence and improving the final performance of the learning process. Experiments with simulated and real systems of balancing tasks and a 3 DoF robot arm validate the advantages of the proposed learning strategies: (i) they speed up the convergence of the learning process between 3 and 30 times, saving considerable time during the agent adaptation, and (ii) they allow including non-expert feedback because they have low sensibility to erroneous human advice.
KW - Interactive machine learning
KW - Learning from demonstration
KW - Policy search
KW - Reinforcement learning
UR - http://resolver.tudelft.nl/uuid:753b8fad-e98f-4c2d-b959-67ec56fe4bc1
UR - http://www.scopus.com/inward/record.url?scp=85051668828&partnerID=8YFLogxK
U2 - 10.1007/s10514-018-9786-6
DO - 10.1007/s10514-018-9786-6
M3 - Article
SN - 0929-5593
VL - 43 (2019)
SP - 1173
EP - 1186
JO - Autonomous Robots
JF - Autonomous Robots
IS - 5
ER -