An Incremental Inverse Reinforcement Learning Approach for Motion Planning with Separated Path and Velocity Preferences

S. Avaei; L.F. van der Spaa; L. Peternel; J. Kober

doi:10.3390/robotics12020061

An Incremental Inverse Reinforcement Learning Approach for Motion Planning with Separated Path and Velocity Preferences

S. Avaei, L.F. van der Spaa^*, L. Peternel, J. Kober

^*Corresponding author for this work

Research output: Contribution to journal › Article › Scientific › peer-review

59 Downloads (Pure)

Abstract

Humans often demonstrate diverse behaviors due to their personal preferences, for instance, related to their individual execution style or personal margin for safety. In this paper, we consider the problem of integrating both path and velocity preferences into trajectory planning for robotic manipulators. We first learn reward functions that represent the user path and velocity preferences from kinesthetic demonstration. We then optimize the trajectory in two steps, first the path and then the velocity, to produce trajectories that adhere to both task requirements and user preferences. We design a set of parameterized features that capture the fundamental preferences in a pick-and-place type of object transportation task, both in the shape and timing of the motion. We demonstrate that our method is capable of generalizing such preferences to new scenarios. We implement our algorithm on a Franka Emika 7-DoF robot arm and validate the functionality and flexibility of our approach in a user study. The results show that non-expert users are able to teach the robot their preferences with just a few iterations of feedback.

Original language	English
Article number	61
Number of pages	22
Journal	Robotics
Volume	12
Issue number	2
DOIs	https://doi.org/10.3390/robotics12020061
Publication status	Published - 2023

Keywords

learning from demonstration
human preferences
incremental inverse reinforcement learning
coactive learning
physical human–robot interaction

Access to Document

10.3390/robotics12020061

robotics-12-00061-v2Final published version, 3.3 MBLicence: CC BY

Cite this

@article{cfcf8ebd8db44b5ea1746c0c16b21dfd,

title = "An Incremental Inverse Reinforcement Learning Approach for Motion Planning with Separated Path and Velocity Preferences",

abstract = "Humans often demonstrate diverse behaviors due to their personal preferences, for instance, related to their individual execution style or personal margin for safety. In this paper, we consider the problem of integrating both path and velocity preferences into trajectory planning for robotic manipulators. We first learn reward functions that represent the user path and velocity preferences from kinesthetic demonstration. We then optimize the trajectory in two steps, first the path and then the velocity, to produce trajectories that adhere to both task requirements and user preferences. We design a set of parameterized features that capture the fundamental preferences in a pick-and-place type of object transportation task, both in the shape and timing of the motion. We demonstrate that our method is capable of generalizing such preferences to new scenarios. We implement our algorithm on a Franka Emika 7-DoF robot arm and validate the functionality and flexibility of our approach in a user study. The results show that non-expert users are able to teach the robot their preferences with just a few iterations of feedback.",

keywords = "learning from demonstration, human preferences, incremental inverse reinforcement learning, coactive learning, physical human–robot interaction",

author = "S. Avaei and {van der Spaa}, L.F. and L. Peternel and J. Kober",

year = "2023",

doi = "10.3390/robotics12020061",

language = "English",

volume = "12",

journal = "Robotics",

issn = "2218-6581",

publisher = "MDPI",

number = "2",

}

TY - JOUR

T1 - An Incremental Inverse Reinforcement Learning Approach for Motion Planning with Separated Path and Velocity Preferences

AU - Avaei, S.

AU - van der Spaa, L.F.

AU - Peternel, L.

AU - Kober, J.

PY - 2023

Y1 - 2023

N2 - Humans often demonstrate diverse behaviors due to their personal preferences, for instance, related to their individual execution style or personal margin for safety. In this paper, we consider the problem of integrating both path and velocity preferences into trajectory planning for robotic manipulators. We first learn reward functions that represent the user path and velocity preferences from kinesthetic demonstration. We then optimize the trajectory in two steps, first the path and then the velocity, to produce trajectories that adhere to both task requirements and user preferences. We design a set of parameterized features that capture the fundamental preferences in a pick-and-place type of object transportation task, both in the shape and timing of the motion. We demonstrate that our method is capable of generalizing such preferences to new scenarios. We implement our algorithm on a Franka Emika 7-DoF robot arm and validate the functionality and flexibility of our approach in a user study. The results show that non-expert users are able to teach the robot their preferences with just a few iterations of feedback.

AB - Humans often demonstrate diverse behaviors due to their personal preferences, for instance, related to their individual execution style or personal margin for safety. In this paper, we consider the problem of integrating both path and velocity preferences into trajectory planning for robotic manipulators. We first learn reward functions that represent the user path and velocity preferences from kinesthetic demonstration. We then optimize the trajectory in two steps, first the path and then the velocity, to produce trajectories that adhere to both task requirements and user preferences. We design a set of parameterized features that capture the fundamental preferences in a pick-and-place type of object transportation task, both in the shape and timing of the motion. We demonstrate that our method is capable of generalizing such preferences to new scenarios. We implement our algorithm on a Franka Emika 7-DoF robot arm and validate the functionality and flexibility of our approach in a user study. The results show that non-expert users are able to teach the robot their preferences with just a few iterations of feedback.

KW - learning from demonstration

KW - human preferences

KW - incremental inverse reinforcement learning

KW - coactive learning

KW - physical human–robot interaction

U2 - 10.3390/robotics12020061

DO - 10.3390/robotics12020061

M3 - Article

SN - 2218-6581

VL - 12

JO - Robotics

JF - Robotics

IS - 2

M1 - 61

ER -

An Incremental Inverse Reinforcement Learning Approach for Motion Planning with Separated Path and Velocity Preferences

Abstract

Keywords

Access to Document

Fingerprint

Cite this