Continuous control for high-dimensional state spaces: An interactive learning approach

Rodrigo Pérez-Dattari; Carlos Celemin; Javier Ruiz-Del-Solar; Jens Kober

doi:10.1109/ICRA.2019.8793675

Continuous control for high-dimensional state spaces: An interactive learning approach

Rodrigo Pérez-Dattari, Carlos Celemin, Javier Ruiz-Del-Solar, Jens Kober

Learning & Autonomous Control

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

10 Citations (Scopus)

29 Downloads (Pure)

Abstract

Deep Reinforcement Learning (DRL) has become a powerful methodology to solve complex decision-making problems. However, DRL has several limitations when used in real-world problems (e.g., robotics applications). For instance, long training times are required and cannot be accelerated in contrast to simulated environments, and reward functions may be hard to specify/model and/or to compute. Moreover, the transfer of policies learned in a simulator to the real-world has limitations (reality gap). On the other hand, machine learning methods that rely on the transfer of human knowledge to an agent have shown to be time efficient for obtaining well performing policies and do not require a reward function. In this context, we analyze the use of human corrective feedback during task execution to learn policies with high-dimensional state spaces, by using the D-COACH framework, and we propose new variants of this framework. D-COACH is a Deep Learning based extension of COACH (COrrective Advice Communicated by Humans), where humans are able to shape policies through corrective advice. The enhanced version of DCOACH, which is proposed in this paper, largely reduces the time and effort of a human for training a policy. Experimental results validate the efficiency of the D-COACH framework in three different problems (simulated and with real robots), and show that its enhanced version reduces the human training effort considerably, and makes it feasible to learn policies within periods of time in which a DRL agent do not reach any improvement.

Original language	English
Title of host publication	Proceedings of the International Conference on Robotics and Automation, ICRA 2019
Place of Publication	Piscataway, NJ, USA
Publisher	IEEE
Pages	7611-7617
ISBN (Electronic)	978-1-5386-6026-3
DOIs	https://doi.org/10.1109/ICRA.2019.8793675
Publication status	Published - 2019
Event	2019 International Conference on Robotics and Automation, ICRA 2019 - Montreal, Canada Duration: 20 May 2019 → 24 May 2019

Conference

Conference	2019 International Conference on Robotics and Automation, ICRA 2019
Country/Territory	Canada
City	Montreal
Period	20/05/19 → 24/05/19

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care

Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Access to Document

10.1109/ICRA.2019.8793675

Continuous_Control_for_High-Dimensional_State_Spaces_An_Interactive_Learning_ApproachFinal published version, 2.53 MB

Cite this

@inproceedings{7894d21e62bd4fc188ce20caf3ff3fa8,

title = "Continuous control for high-dimensional state spaces: An interactive learning approach",

abstract = "Deep Reinforcement Learning (DRL) has become a powerful methodology to solve complex decision-making problems. However, DRL has several limitations when used in real-world problems (e.g., robotics applications). For instance, long training times are required and cannot be accelerated in contrast to simulated environments, and reward functions may be hard to specify/model and/or to compute. Moreover, the transfer of policies learned in a simulator to the real-world has limitations (reality gap). On the other hand, machine learning methods that rely on the transfer of human knowledge to an agent have shown to be time efficient for obtaining well performing policies and do not require a reward function. In this context, we analyze the use of human corrective feedback during task execution to learn policies with high-dimensional state spaces, by using the D-COACH framework, and we propose new variants of this framework. D-COACH is a Deep Learning based extension of COACH (COrrective Advice Communicated by Humans), where humans are able to shape policies through corrective advice. The enhanced version of DCOACH, which is proposed in this paper, largely reduces the time and effort of a human for training a policy. Experimental results validate the efficiency of the D-COACH framework in three different problems (simulated and with real robots), and show that its enhanced version reduces the human training effort considerably, and makes it feasible to learn policies within periods of time in which a DRL agent do not reach any improvement.",

author = "Rodrigo P{\'e}rez-Dattari and Carlos Celemin and Javier Ruiz-Del-Solar and Jens Kober",

note = "Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.; 2019 International Conference on Robotics and Automation, ICRA 2019 ; Conference date: 20-05-2019 Through 24-05-2019",

year = "2019",

doi = "10.1109/ICRA.2019.8793675",

language = "English",

pages = "7611--7617",

booktitle = "Proceedings of the International Conference on Robotics and Automation, ICRA 2019",

publisher = "IEEE",

address = "United States",

}

Pérez-Dattari, R, Celemin, C, Ruiz-Del-Solar, J & Kober, J 2019, Continuous control for high-dimensional state spaces: An interactive learning approach. in Proceedings of the International Conference on Robotics and Automation, ICRA 2019., 8793675, IEEE, Piscataway, NJ, USA, pp. 7611-7617, 2019 International Conference on Robotics and Automation, ICRA 2019, Montreal, Canada, 20/05/19. https://doi.org/10.1109/ICRA.2019.8793675

Continuous control for high-dimensional state spaces: An interactive learning approach. / Pérez-Dattari, Rodrigo; Celemin, Carlos; Ruiz-Del-Solar, Javier et al.
Proceedings of the International Conference on Robotics and Automation, ICRA 2019. Piscataway, NJ, USA: IEEE, 2019. p. 7611-7617 8793675.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Continuous control for high-dimensional state spaces

T2 - 2019 International Conference on Robotics and Automation, ICRA 2019

AU - Pérez-Dattari, Rodrigo

AU - Celemin, Carlos

AU - Ruiz-Del-Solar, Javier

AU - Kober, Jens

N1 - Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2019

Y1 - 2019

N2 - Deep Reinforcement Learning (DRL) has become a powerful methodology to solve complex decision-making problems. However, DRL has several limitations when used in real-world problems (e.g., robotics applications). For instance, long training times are required and cannot be accelerated in contrast to simulated environments, and reward functions may be hard to specify/model and/or to compute. Moreover, the transfer of policies learned in a simulator to the real-world has limitations (reality gap). On the other hand, machine learning methods that rely on the transfer of human knowledge to an agent have shown to be time efficient for obtaining well performing policies and do not require a reward function. In this context, we analyze the use of human corrective feedback during task execution to learn policies with high-dimensional state spaces, by using the D-COACH framework, and we propose new variants of this framework. D-COACH is a Deep Learning based extension of COACH (COrrective Advice Communicated by Humans), where humans are able to shape policies through corrective advice. The enhanced version of DCOACH, which is proposed in this paper, largely reduces the time and effort of a human for training a policy. Experimental results validate the efficiency of the D-COACH framework in three different problems (simulated and with real robots), and show that its enhanced version reduces the human training effort considerably, and makes it feasible to learn policies within periods of time in which a DRL agent do not reach any improvement.

AB - Deep Reinforcement Learning (DRL) has become a powerful methodology to solve complex decision-making problems. However, DRL has several limitations when used in real-world problems (e.g., robotics applications). For instance, long training times are required and cannot be accelerated in contrast to simulated environments, and reward functions may be hard to specify/model and/or to compute. Moreover, the transfer of policies learned in a simulator to the real-world has limitations (reality gap). On the other hand, machine learning methods that rely on the transfer of human knowledge to an agent have shown to be time efficient for obtaining well performing policies and do not require a reward function. In this context, we analyze the use of human corrective feedback during task execution to learn policies with high-dimensional state spaces, by using the D-COACH framework, and we propose new variants of this framework. D-COACH is a Deep Learning based extension of COACH (COrrective Advice Communicated by Humans), where humans are able to shape policies through corrective advice. The enhanced version of DCOACH, which is proposed in this paper, largely reduces the time and effort of a human for training a policy. Experimental results validate the efficiency of the D-COACH framework in three different problems (simulated and with real robots), and show that its enhanced version reduces the human training effort considerably, and makes it feasible to learn policies within periods of time in which a DRL agent do not reach any improvement.

UR - http://www.scopus.com/inward/record.url?scp=85071516358&partnerID=8YFLogxK

U2 - 10.1109/ICRA.2019.8793675

DO - 10.1109/ICRA.2019.8793675

M3 - Conference contribution

AN - SCOPUS:85071516358

SP - 7611

EP - 7617

BT - Proceedings of the International Conference on Robotics and Automation, ICRA 2019

PB - IEEE

CY - Piscataway, NJ, USA

Y2 - 20 May 2019 through 24 May 2019

ER -

Continuous control for high-dimensional state spaces: An interactive learning approach

Abstract

Conference

Bibliographical note

Access to Document

Other files and links

Fingerprint

Cite this