SEQUEL: Semi-Supervised Preference-based RL with Query Synthesis via Latent Interpolation

Daniel Marta*, Simon Holk, Christian Pek, Iolanda Leite

*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

19 Downloads (Pure)

Abstract

Preference-based reinforcement learning (RL) poses as a recent research direction in robot learning, by allowing humans to teach robots through preferences on pairs of desired behaviours. Nonetheless, to obtain realistic robot policies, an arbitrarily large number of queries is required to be answered by humans. In this work, we approach the sample-efficiency challenge by presenting a technique which synthesizes queries, in a semi-supervised learning perspective. To achieve this, we leverage latent variational autoencoder (VAE) representations of trajectory segments (sequences of state-action pairs). Our approach manages to produce queries which are closely aligned with those labeled by humans, while avoiding excessive uncertainty according to the human preference predictions as determined by reward estimations. Additionally, by introducing variation without deviating from the original human's intents, more robust reward function representations are achieved. We compare our approach to recent state-of-the-art preference-based RL semi-supervised learning techniques. Our experimental findings reveal that we can enhance the generalization of the estimated reward function without requiring additional human intervention. Lastly, to confirm the practical applicability of our approach, we conduct experiments involving actual human users in a simulated social navigation setting. Videos of the experiments can be found at https://sites.google.com/view/rl-sequel

Original languageEnglish
Title of host publicationProceedings of the IEEE International Conference on Robotics and Automation, ICRA 2024
PublisherIEEE
Pages9585-9592
Number of pages8
ISBN (Electronic)979-8-3503-8457-4
DOIs
Publication statusPublished - 2024
Event2024 IEEE International Conference on Robotics and Automation, ICRA 2024 - Yokohama, Japan
Duration: 13 May 202417 May 2024

Publication series

NameProceedings - IEEE International Conference on Robotics and Automation
ISSN (Print)1050-4729

Conference

Conference2024 IEEE International Conference on Robotics and Automation, ICRA 2024
Country/TerritoryJapan
CityYokohama
Period13/05/2417/05/24

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Fingerprint

Dive into the research topics of 'SEQUEL: Semi-Supervised Preference-based RL with Query Synthesis via Latent Interpolation'. Together they form a unique fingerprint.

Cite this