To the Max: Reinventing Reward in Reinforcement Learning

Grigorii Veviurko*, Wendelin Böhmer, Mathijs de Weerdt

*Corresponding author for this work

Research output: Contribution to journalConference articleScientificpeer-review

10 Downloads (Pure)

Abstract

In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. For some, the agent gets stuck with a suboptimal behavior, and for others, it solves the task efficiently. Choosing a good reward function is hence an extremely important yet challenging problem. In this paper, we explore an alternative approach for using rewards for learning. We introduce max-reward RL, where an agent optimizes the maximum rather than the cumulative reward. Unlike earlier works, our approach works for deterministic and stochastic environments and can be easily combined with state-of-the-art RL algorithms. In the experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium-Robotics and demonstrate its benefits over standard RL. The code is available at https://github.com/veviurko/To-the-Max.

Original languageEnglish
Pages (from-to)49455-49470
Number of pages16
JournalProceedings of Machine Learning Research
Volume235
Publication statusPublished - 2024
Event41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria
Duration: 21 Jul 202427 Jul 2024

Fingerprint

Dive into the research topics of 'To the Max: Reinventing Reward in Reinforcement Learning'. Together they form a unique fingerprint.

Cite this