Evaluation of physical damage associated with action selection strategies in reinforcement learning

Ivan Koryakovskiy; Heike Vallery; Robert Babuška; Wouter Caarls

doi:10.1016/j.ifacol.2017.08.1218

Evaluation of physical damage associated with action selection strategies in reinforcement learning

Ivan Koryakovskiy, Heike Vallery, Robert Babuška, Wouter Caarls

Research output: Contribution to journal › Conference article › Scientific › peer-review

5 Citations (Scopus)

54 Downloads (Pure)

Abstract

Reinforcement learning techniques enable robots to deal with their own dynamics and with unknown environments without using explicit models or preprogrammed behaviors. However, reinforcement learning relies on intrinsically risky exploration, which is often damaging for physical systems. In the case of the bipedal walking robot Leo, which is studied in this paper, two sources of damage can be identified: fatigue of gearboxes due to backlash re-engagements, and the overall system damage due to falls of the robot. We investigate several exploration techniques and compare them in terms of gearbox fatigue, cumulative number of falls and undiscounted return. The results show that exploration with the Ornstein-Uhlenbeck (OU) process noise leads to the highest return, but at the same time it causes the largest number of falls. The Previous Action-Dependent Action (PADA) method results in drastically reduced fatigue, but also a large number of falls. The results reveal a previously unknown trade-off between the two sources of damage. Inspired by the OU and PADA methods, we propose four new action-selection methods in a systematic way. One of the proposed methods with a time-correlated noise outperforms the well-known e-greedy method in all three benchmarks. We provide guidance towards the choice of exploration strategy for reinforcement learning applications on real physical systems.

Original language	English
Pages (from-to)	6928-6933
Journal	IFAC-PapersOnLine
Volume	50
Issue number	1
DOIs	https://doi.org/10.1016/j.ifacol.2017.08.1218
Publication status	Published - 2017
Event	20th World Congress of the International Federation of Automatic Control (IFAC), 2017 - Toulouse, France Duration: 9 Jul 2017 → 14 Jul 2017 Conference number: 20 https://www.ifac2017.org

Keywords

Adaptation
Analysis of reliability
Autonomous robotic systems
diagnosis
Fault detection
learning in physical agents
Reinforcement learning control
safety

Access to Document

10.1016/j.ifacol.2017.08.1218

1-s2.0-S240589631731724X-mainFinal published version, 2.04 MB

Cite this

@article{8df049df64ce430cb4c5e345c961d58d,

title = "Evaluation of physical damage associated with action selection strategies in reinforcement learning",

abstract = "Reinforcement learning techniques enable robots to deal with their own dynamics and with unknown environments without using explicit models or preprogrammed behaviors. However, reinforcement learning relies on intrinsically risky exploration, which is often damaging for physical systems. In the case of the bipedal walking robot Leo, which is studied in this paper, two sources of damage can be identified: fatigue of gearboxes due to backlash re-engagements, and the overall system damage due to falls of the robot. We investigate several exploration techniques and compare them in terms of gearbox fatigue, cumulative number of falls and undiscounted return. The results show that exploration with the Ornstein-Uhlenbeck (OU) process noise leads to the highest return, but at the same time it causes the largest number of falls. The Previous Action-Dependent Action (PADA) method results in drastically reduced fatigue, but also a large number of falls. The results reveal a previously unknown trade-off between the two sources of damage. Inspired by the OU and PADA methods, we propose four new action-selection methods in a systematic way. One of the proposed methods with a time-correlated noise outperforms the well-known e-greedy method in all three benchmarks. We provide guidance towards the choice of exploration strategy for reinforcement learning applications on real physical systems.",

keywords = "Adaptation, Analysis of reliability, Autonomous robotic systems, diagnosis, Fault detection, learning in physical agents, Reinforcement learning control, safety",

author = "Ivan Koryakovskiy and Heike Vallery and Robert Babu{\v s}ka and Wouter Caarls",

year = "2017",

doi = "10.1016/j.ifacol.2017.08.1218",

language = "English",

volume = "50",

pages = "6928--6933",

journal = "IFAC-PapersOnLine",

issn = "2405-8963",

publisher = "Elsevier",

number = "1",

note = "20th World Congress of the International Federation of Automatic Control (IFAC), 2017, IFAC 2017 ; Conference date: 09-07-2017 Through 14-07-2017",

url = "https://www.ifac2017.org",

}

TY - JOUR

T1 - Evaluation of physical damage associated with action selection strategies in reinforcement learning

AU - Koryakovskiy, Ivan

AU - Vallery, Heike

AU - Babuška, Robert

AU - Caarls, Wouter

N1 - Conference code: 20

PY - 2017

Y1 - 2017

N2 - Reinforcement learning techniques enable robots to deal with their own dynamics and with unknown environments without using explicit models or preprogrammed behaviors. However, reinforcement learning relies on intrinsically risky exploration, which is often damaging for physical systems. In the case of the bipedal walking robot Leo, which is studied in this paper, two sources of damage can be identified: fatigue of gearboxes due to backlash re-engagements, and the overall system damage due to falls of the robot. We investigate several exploration techniques and compare them in terms of gearbox fatigue, cumulative number of falls and undiscounted return. The results show that exploration with the Ornstein-Uhlenbeck (OU) process noise leads to the highest return, but at the same time it causes the largest number of falls. The Previous Action-Dependent Action (PADA) method results in drastically reduced fatigue, but also a large number of falls. The results reveal a previously unknown trade-off between the two sources of damage. Inspired by the OU and PADA methods, we propose four new action-selection methods in a systematic way. One of the proposed methods with a time-correlated noise outperforms the well-known e-greedy method in all three benchmarks. We provide guidance towards the choice of exploration strategy for reinforcement learning applications on real physical systems.

AB - Reinforcement learning techniques enable robots to deal with their own dynamics and with unknown environments without using explicit models or preprogrammed behaviors. However, reinforcement learning relies on intrinsically risky exploration, which is often damaging for physical systems. In the case of the bipedal walking robot Leo, which is studied in this paper, two sources of damage can be identified: fatigue of gearboxes due to backlash re-engagements, and the overall system damage due to falls of the robot. We investigate several exploration techniques and compare them in terms of gearbox fatigue, cumulative number of falls and undiscounted return. The results show that exploration with the Ornstein-Uhlenbeck (OU) process noise leads to the highest return, but at the same time it causes the largest number of falls. The Previous Action-Dependent Action (PADA) method results in drastically reduced fatigue, but also a large number of falls. The results reveal a previously unknown trade-off between the two sources of damage. Inspired by the OU and PADA methods, we propose four new action-selection methods in a systematic way. One of the proposed methods with a time-correlated noise outperforms the well-known e-greedy method in all three benchmarks. We provide guidance towards the choice of exploration strategy for reinforcement learning applications on real physical systems.

KW - Adaptation

KW - Analysis of reliability

KW - Autonomous robotic systems

KW - diagnosis

KW - Fault detection

KW - learning in physical agents

KW - Reinforcement learning control

KW - safety

UR - http://resolver.tudelft.nl/uuid:8df049df-64ce-430c-b4c5-e345c961d58d

UR - http://www.scopus.com/inward/record.url?scp=85031775284&partnerID=8YFLogxK

U2 - 10.1016/j.ifacol.2017.08.1218

DO - 10.1016/j.ifacol.2017.08.1218

M3 - Conference article

AN - SCOPUS:85031775284

SN - 2405-8963

VL - 50

SP - 6928

EP - 6933

JO - IFAC-PapersOnLine

JF - IFAC-PapersOnLine

IS - 1

T2 - 20th World Congress of the International Federation of Automatic Control (IFAC), 2017

Y2 - 9 July 2017 through 14 July 2017

ER -

Evaluation of physical damage associated with action selection strategies in reinforcement learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this