Human corrective advice in the policy search loop

Carlos Celemin, Guilherme Maeda, Jens Kober, Javier Ruiz-del-Solar

Research output: Contribution to conferenceAbstractScientific

Abstract

Machine Learning methods applied to decision making problems with real robots usually suffer from slow convergence due to the dimensionality of the search and difficulties in the reward design. Interactive Machine Learning (IML) or Learning from Demonstrations (LfD) methods are usually simple and relatively fast for improving a policy but have the drawback of being sensitive to the inherent occasional erroneous feedback from human teachers. Reinforcement Learning (RL) methods may converge to optimal solutions according to the encoded reward function, but they become inefficient as the dimensionality of the state-action space grows.
Original languageEnglish
Number of pages2
Publication statusPublished - 2017
EventIROS 2017: IEEE/RSJ International Conference on Intelligent Robots and Systems - Vancouver, Canada
Duration: 24 Sep 201728 Sep 2017
http://www.iros2017.org/

Conference

ConferenceIROS 2017: IEEE/RSJ International Conference on Intelligent Robots and Systems
CountryCanada
CityVancouver
Period24/09/1728/09/17
Internet address

Keywords

  • Reinforcement Learning
  • learning from demonstration
  • interactive machine learning
  • movement primitives

Fingerprint

Dive into the research topics of 'Human corrective advice in the policy search loop'. Together they form a unique fingerprint.

Cite this