Reinforcement learning of motor skills using Policy Search and human corrective advice

Carlos Celemin; Guilherme Maeda; Javier Ruiz-del-Solar; Jan Peters; Jens Kober

doi:10.1177/0278364919871998

Reinforcement learning of motor skills using Policy Search and human corrective advice

Carlos Celemin^*, Guilherme Maeda, Javier Ruiz-del-Solar, Jan Peters, Jens Kober

^*Corresponding author for this work

Learning & Autonomous Control

Research output: Contribution to journal › Article › Scientific › peer-review

14 Citations (Scopus)

178 Downloads (Pure)

Abstract

Robot learning problems are limited by physical constraints, which make learning successful policies for complex motor skills on real systems unfeasible. Some reinforcement learning methods, like Policy Search, offer stable convergence toward locally optimal solutions, whereas interactive machine learning or learning-from-demonstration methods allow fast transfer of human knowledge to the agents. However, most methods require expert demonstrations. In this work, we propose the use of human corrective advice in the actions domain for learning motor trajectories. Additionally, we combine this human feedback with reward functions in a Policy Search learning scheme. The use of both sources of information speeds up the learning process, since the intuitive knowledge of the human teacher can be easily transferred to the agent, while the Policy Search method with the cost/reward function take over for supervising the process and reducing the influence of occasional wrong human corrections. This interactive approach has been validated for learning movement primitives with simulated arms with several degrees of freedom in reaching via-point movements, and also using real robots in such tasks as “writing characters” and the ball-in-a-cup game. Compared with standard reinforcement learning without human advice, the results show that the proposed method not only converges to higher rewards when learning movement primitives, but also that the learning is sped up by a factor of 4–40 times, depending on the task.

Original language	English
Pages (from-to)	1560-1580
Journal	International Journal of Robotics Research
Volume	38
Issue number	14
DOIs	https://doi.org/10.1177/0278364919871998
Publication status	Published - 2019

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care

Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

interactive machine learning
Learning from Demonstrations
motor skills
movement primitives
Policy Search
Reinforcement learning

Access to Document

10.1177/0278364919871998

0278364919871998Final published version, 4.39 MB

Cite this

@article{a340c6173df748609f0a5acdcc70d1c6,

title = "Reinforcement learning of motor skills using Policy Search and human corrective advice",

abstract = "Robot learning problems are limited by physical constraints, which make learning successful policies for complex motor skills on real systems unfeasible. Some reinforcement learning methods, like Policy Search, offer stable convergence toward locally optimal solutions, whereas interactive machine learning or learning-from-demonstration methods allow fast transfer of human knowledge to the agents. However, most methods require expert demonstrations. In this work, we propose the use of human corrective advice in the actions domain for learning motor trajectories. Additionally, we combine this human feedback with reward functions in a Policy Search learning scheme. The use of both sources of information speeds up the learning process, since the intuitive knowledge of the human teacher can be easily transferred to the agent, while the Policy Search method with the cost/reward function take over for supervising the process and reducing the influence of occasional wrong human corrections. This interactive approach has been validated for learning movement primitives with simulated arms with several degrees of freedom in reaching via-point movements, and also using real robots in such tasks as “writing characters” and the ball-in-a-cup game. Compared with standard reinforcement learning without human advice, the results show that the proposed method not only converges to higher rewards when learning movement primitives, but also that the learning is sped up by a factor of 4–40 times, depending on the task.",

keywords = "interactive machine learning, Learning from Demonstrations, motor skills, movement primitives, Policy Search, Reinforcement learning",

author = "Carlos Celemin and Guilherme Maeda and Javier Ruiz-del-Solar and Jan Peters and Jens Kober",

note = "Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.",

year = "2019",

doi = "10.1177/0278364919871998",

language = "English",

volume = "38",

pages = "1560--1580",

journal = "International Journal of Robotics Research",

issn = "0278-3649",

publisher = "SAGE Publishing",

number = "14",

}

TY - JOUR

T1 - Reinforcement learning of motor skills using Policy Search and human corrective advice

AU - Celemin, Carlos

AU - Maeda, Guilherme

AU - Ruiz-del-Solar, Javier

AU - Peters, Jan

AU - Kober, Jens

N1 - Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2019

Y1 - 2019

N2 - Robot learning problems are limited by physical constraints, which make learning successful policies for complex motor skills on real systems unfeasible. Some reinforcement learning methods, like Policy Search, offer stable convergence toward locally optimal solutions, whereas interactive machine learning or learning-from-demonstration methods allow fast transfer of human knowledge to the agents. However, most methods require expert demonstrations. In this work, we propose the use of human corrective advice in the actions domain for learning motor trajectories. Additionally, we combine this human feedback with reward functions in a Policy Search learning scheme. The use of both sources of information speeds up the learning process, since the intuitive knowledge of the human teacher can be easily transferred to the agent, while the Policy Search method with the cost/reward function take over for supervising the process and reducing the influence of occasional wrong human corrections. This interactive approach has been validated for learning movement primitives with simulated arms with several degrees of freedom in reaching via-point movements, and also using real robots in such tasks as “writing characters” and the ball-in-a-cup game. Compared with standard reinforcement learning without human advice, the results show that the proposed method not only converges to higher rewards when learning movement primitives, but also that the learning is sped up by a factor of 4–40 times, depending on the task.

AB - Robot learning problems are limited by physical constraints, which make learning successful policies for complex motor skills on real systems unfeasible. Some reinforcement learning methods, like Policy Search, offer stable convergence toward locally optimal solutions, whereas interactive machine learning or learning-from-demonstration methods allow fast transfer of human knowledge to the agents. However, most methods require expert demonstrations. In this work, we propose the use of human corrective advice in the actions domain for learning motor trajectories. Additionally, we combine this human feedback with reward functions in a Policy Search learning scheme. The use of both sources of information speeds up the learning process, since the intuitive knowledge of the human teacher can be easily transferred to the agent, while the Policy Search method with the cost/reward function take over for supervising the process and reducing the influence of occasional wrong human corrections. This interactive approach has been validated for learning movement primitives with simulated arms with several degrees of freedom in reaching via-point movements, and also using real robots in such tasks as “writing characters” and the ball-in-a-cup game. Compared with standard reinforcement learning without human advice, the results show that the proposed method not only converges to higher rewards when learning movement primitives, but also that the learning is sped up by a factor of 4–40 times, depending on the task.

KW - interactive machine learning

KW - Learning from Demonstrations

KW - motor skills

KW - movement primitives

KW - Policy Search

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85073830851&partnerID=8YFLogxK

U2 - 10.1177/0278364919871998

DO - 10.1177/0278364919871998

M3 - Article

SN - 0278-3649

VL - 38

SP - 1560

EP - 1580

JO - International Journal of Robotics Research

JF - International Journal of Robotics Research

IS - 14

ER -

Reinforcement learning of motor skills using Policy Search and human corrective advice

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this