Noise-conditioned Energy-based Annealed Rewards (NEAR): A generative framework for imitation learning from observation

A.A. Diwan*, Julen Urain, J. Kober, Jan Peters

*Corresponding author for this work

Research output: Contribution to conferencePosterScientific

4 Downloads (Pure)

Abstract

This paper introduces a new imitation learning framework based on energy-based generative models capable of learning complex, physics-dependent, robot motion policies through state-only expert motion trajectories. Our algorithm, called Noise-conditioned Energy-based Annealed Rewards (NEAR), constructs several perturbed versions of the expert's motion data distribution and learns smooth, and well-defined representations of the data distribution's energy function using denoising score matching. We propose to use these learnt energy functions as reward functions to learn imitation policies via reinforcement learning. We also present a strategy to gradually switch between the learnt energy functions, ensuring that the learnt rewards are always well-defined in the manifold of policy-generated samples. We evaluate our algorithm on complex humanoid tasks such as locomotion and martial arts and compare it with state-only adversarial imitation learning algorithms like Adversarial Motion Priors (AMP). Our framework sidesteps the optimisation challenges of adversarial imitation learning techniques and produces results comparable to AMP in several quantitative metrics across multiple imitation settings.
Original languageEnglish
Publication statusPublished - 2025
Event13th International Conference on Learning Representations, ICLR 2025 - Singapore, Singapore
Duration: 24 Apr 202528 Apr 2025
Conference number: 13

Conference

Conference13th International Conference on Learning Representations, ICLR 2025
Abbreviated titleICLR 2025
Country/TerritorySingapore
CitySingapore
Period24/04/2528/04/25

Fingerprint

Dive into the research topics of 'Noise-conditioned Energy-based Annealed Rewards (NEAR): A generative framework for imitation learning from observation'. Together they form a unique fingerprint.

Cite this