Human corrective advice in the policy search loop

Carlos Celemin; Guilherme Maeda; Jens Kober; Javier Ruiz-del-Solar

Abstract

Machine Learning methods applied to decision making problems with real robots usually suffer from slow convergence due to the dimensionality of the search and difficulties in the reward design. Interactive Machine Learning (IML) or Learning from Demonstrations (LfD) methods are usually simple and relatively fast for improving a policy but have the drawback of being sensitive to the inherent occasional erroneous feedback from human teachers. Reinforcement Learning (RL) methods may converge to optimal solutions according to the encoded reward function, but they become inefficient as the dimensionality of the state-action space grows.

Original language	English
Number of pages	2
Publication status	Published - 2017
Event	IROS 2017: IEEE/RSJ International Conference on Intelligent Robots and Systems - Vancouver, Canada Duration: 24 Sept 2017 → 28 Sept 2017 http://www.iros2017.org/

Conference

Conference	IROS 2017: IEEE/RSJ International Conference on Intelligent Robots and Systems
Country/Territory	Canada
City	Vancouver
Period	24/09/17 → 28/09/17
Internet address	http://www.iros2017.org/

Keywords

Reinforcement Learning
learning from demonstration
interactive machine learning
movement primitives

Access to Document

Cite this

@conference{633279c45b0d41d487b6b6ceb3240208,

title = "Human corrective advice in the policy search loop",

abstract = "Machine Learning methods applied to decision making problems with real robots usually suffer from slow convergence due to the dimensionality of the search and difficulties in the reward design. Interactive Machine Learning (IML) or Learning from Demonstrations (LfD) methods are usually simple and relatively fast for improving a policy but have the drawback of being sensitive to the inherent occasional erroneous feedback from human teachers. Reinforcement Learning (RL) methods may converge to optimal solutions according to the encoded reward function, but they become inefficient as the dimensionality of the state-action space grows.",

keywords = "Reinforcement Learning, learning from demonstration, interactive machine learning, movement primitives",

author = "Carlos Celemin and Guilherme Maeda and Jens Kober and Javier Ruiz-del-Solar",

year = "2017",

language = "English",

note = "IROS 2017: IEEE/RSJ International Conference on Intelligent Robots and Systems ; Conference date: 24-09-2017 Through 28-09-2017",

url = "http://www.iros2017.org/",

}

TY - CONF

T1 - Human corrective advice in the policy search loop

AU - Celemin, Carlos

AU - Maeda, Guilherme

AU - Kober, Jens

AU - Ruiz-del-Solar, Javier

PY - 2017

Y1 - 2017

N2 - Machine Learning methods applied to decision making problems with real robots usually suffer from slow convergence due to the dimensionality of the search and difficulties in the reward design. Interactive Machine Learning (IML) or Learning from Demonstrations (LfD) methods are usually simple and relatively fast for improving a policy but have the drawback of being sensitive to the inherent occasional erroneous feedback from human teachers. Reinforcement Learning (RL) methods may converge to optimal solutions according to the encoded reward function, but they become inefficient as the dimensionality of the state-action space grows.

AB - Machine Learning methods applied to decision making problems with real robots usually suffer from slow convergence due to the dimensionality of the search and difficulties in the reward design. Interactive Machine Learning (IML) or Learning from Demonstrations (LfD) methods are usually simple and relatively fast for improving a policy but have the drawback of being sensitive to the inherent occasional erroneous feedback from human teachers. Reinforcement Learning (RL) methods may converge to optimal solutions according to the encoded reward function, but they become inefficient as the dimensionality of the state-action space grows.

KW - Reinforcement Learning

KW - learning from demonstration

KW - interactive machine learning

KW - movement primitives

M3 - Abstract

T2 - IROS 2017: IEEE/RSJ International Conference on Intelligent Robots and Systems

Y2 - 24 September 2017 through 28 September 2017

ER -