A fast hybrid reinforcement learning framework with human corrective feedback

Carlos Celemin; Javier Ruiz-del-Solar; Jens Kober

doi:10.1007/s10514-018-9786-6

A fast hybrid reinforcement learning framework with human corrective feedback

Carlos Celemin^*, Javier Ruiz-del-Solar, Jens Kober

^*Corresponding author for this work

Learning & Autonomous Control

Research output: Contribution to journal › Article › Scientific › peer-review

117 Downloads (Pure)

Abstract

Reinforcement Learning agents can be supported by feedback from human teachers in the learning loop that guides the learning process. In this work we propose two hybrid strategies of Policy Search Reinforcement Learning and Interactive Machine Learning that benefit from both sources of information, the cost function and the human corrective feedback, for accelerating the convergence and improving the final performance of the learning process. Experiments with simulated and real systems of balancing tasks and a 3 DoF robot arm validate the advantages of the proposed learning strategies: (i) they speed up the convergence of the learning process between 3 and 30 times, saving considerable time during the agent adaptation, and (ii) they allow including non-expert feedback because they have low sensibility to erroneous human advice.

Original language	English
Pages (from-to)	1173-1186
Journal	Autonomous Robots
Volume	43 (2019)
Issue number	5
DOIs	https://doi.org/10.1007/s10514-018-9786-6
Publication status	Published - 2018

Keywords

Interactive machine learning
Learning from demonstration
Policy search
Reinforcement learning

Access to Document

10.1007/s10514-018-9786-6

Celemin2018_Article_AFastHybridReinforcementLearniFinal published version, 2.18 MBLicence: CC BY

Cite this

@article{753b8fade98f4c2db95967ec56fe4bc1,

title = "A fast hybrid reinforcement learning framework with human corrective feedback",

abstract = "Reinforcement Learning agents can be supported by feedback from human teachers in the learning loop that guides the learning process. In this work we propose two hybrid strategies of Policy Search Reinforcement Learning and Interactive Machine Learning that benefit from both sources of information, the cost function and the human corrective feedback, for accelerating the convergence and improving the final performance of the learning process. Experiments with simulated and real systems of balancing tasks and a 3 DoF robot arm validate the advantages of the proposed learning strategies: (i) they speed up the convergence of the learning process between 3 and 30 times, saving considerable time during the agent adaptation, and (ii) they allow including non-expert feedback because they have low sensibility to erroneous human advice.",

keywords = "Interactive machine learning, Learning from demonstration, Policy search, Reinforcement learning",

author = "Carlos Celemin and Javier Ruiz-del-Solar and Jens Kober",

year = "2018",

doi = "10.1007/s10514-018-9786-6",

language = "English",

volume = "43 (2019)",

pages = "1173--1186",

journal = "Autonomous Robots",

issn = "0929-5593",

publisher = "Springer",

number = "5",

}

TY - JOUR

T1 - A fast hybrid reinforcement learning framework with human corrective feedback

AU - Celemin, Carlos

AU - Ruiz-del-Solar, Javier

AU - Kober, Jens

PY - 2018

Y1 - 2018

N2 - Reinforcement Learning agents can be supported by feedback from human teachers in the learning loop that guides the learning process. In this work we propose two hybrid strategies of Policy Search Reinforcement Learning and Interactive Machine Learning that benefit from both sources of information, the cost function and the human corrective feedback, for accelerating the convergence and improving the final performance of the learning process. Experiments with simulated and real systems of balancing tasks and a 3 DoF robot arm validate the advantages of the proposed learning strategies: (i) they speed up the convergence of the learning process between 3 and 30 times, saving considerable time during the agent adaptation, and (ii) they allow including non-expert feedback because they have low sensibility to erroneous human advice.

AB - Reinforcement Learning agents can be supported by feedback from human teachers in the learning loop that guides the learning process. In this work we propose two hybrid strategies of Policy Search Reinforcement Learning and Interactive Machine Learning that benefit from both sources of information, the cost function and the human corrective feedback, for accelerating the convergence and improving the final performance of the learning process. Experiments with simulated and real systems of balancing tasks and a 3 DoF robot arm validate the advantages of the proposed learning strategies: (i) they speed up the convergence of the learning process between 3 and 30 times, saving considerable time during the agent adaptation, and (ii) they allow including non-expert feedback because they have low sensibility to erroneous human advice.

KW - Interactive machine learning

KW - Learning from demonstration

KW - Policy search

KW - Reinforcement learning

UR - http://resolver.tudelft.nl/uuid:753b8fad-e98f-4c2d-b959-67ec56fe4bc1

UR - http://www.scopus.com/inward/record.url?scp=85051668828&partnerID=8YFLogxK

U2 - 10.1007/s10514-018-9786-6

DO - 10.1007/s10514-018-9786-6

M3 - Article

SN - 0929-5593

VL - 43 (2019)

SP - 1173

EP - 1186

JO - Autonomous Robots

JF - Autonomous Robots

IS - 5

ER -

A fast hybrid reinforcement learning framework with human corrective feedback

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this