Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning

Thomas M. Moerland, Joost Broekens, Catholijn M. Jonker

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

60 Downloads (Pure)

Abstract

In this paper we study how to learn stochastic, multimodal transition dynamics in reinforcement learning (RL) tasks. We focus on evaluating transition function estimation, while we defer planning over this model to future work. Stochasticity is a fundamental property of many task environments. However, discriminative function approximators have difficulty estimating multimodal stochasticity. In contrast, deep generative models do capture complex high-dimensional outcome distributions. First we discuss why, amongst such models, conditional variational inference (VI) is theoretically most appealing for model-based RL. Subsequently, we compare different VI models on their ability to learn complex stochasticity on simulated functions, as well as on a typical RL gridworld with multimodal dynamics. Results show VI successfully predicts multimodal outcomes, but also robustly ignores these for deterministic parts of the transition dynamics. In summary, we show a robust method to learn multimodal transitions using function approximation, which is a key preliminary for model-based RL in stochastic domains.
Original languageEnglish
Title of host publicationSURL 2017: 1st Scaling-Up Reinforcement Learning (SURL) Workshop
Pages1-18
Number of pages18
Publication statusPublished - 2017
EventSURL 2017: 1st Scaling-Up Reinforcement Learning (SURL) Workshop - Skopje, Macedonia, The Former Yugoslav Republic of
Duration: 18 Sept 201718 Sept 2017

Workshop

WorkshopSURL 2017: 1st Scaling-Up Reinforcement Learning (SURL) Workshop
Country/TerritoryMacedonia, The Former Yugoslav Republic of
CitySkopje
Period18/09/1718/09/17

Fingerprint

Dive into the research topics of 'Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning'. Together they form a unique fingerprint.

Cite this