Influence-aware Memory for Deep Reinforcement Learning in POMDPs

Research output: Contribution to conferencePaperScientificpeer-review

2 Downloads (Pure)

Abstract

Making the right decisions when some of the state variables are hidden, requires removing the uncertainty about the current state of the environment. An agent receiving only partial observations needs to infer the true values of these hidden variables based on the history of observations. Recent deep reinforcement learning methods use recurrent models to keep track of past information. However, these models are expensive to train and have convergence difficulties, especially when dealing with high dimensional input spaces. Inspired by theory from influence-based abstraction, which asserts that in order to predict the hidden state variables we may only need to remember about a small subset of observation variables, we propose InfluenceNet. This new neural architecture tries to overcome the training difficulties in high dimensional problems by restricting the input that goes into the recurrent layers to those variables carrying important information about the non-Markovian dynamics. Results indicate that, by forcing the agent's internal memory to focus on this subset rather than on the full observation, we can outperform ordinary recurrent architectures. This approach also reduces training time and obtains better scores than methods that stack multiple observations to remove partial observability.
Original languageEnglish
Number of pages11
Publication statusAccepted/In press - 11 Dec 2020
EventDeep Reinforcement Learning Workshop, NeurIPs 2020 -
Duration: 11 Dec 202011 Dec 2020

Workshop

WorkshopDeep Reinforcement Learning Workshop, NeurIPs 2020
Period11/12/2011/12/20

Fingerprint Dive into the research topics of 'Influence-aware Memory for Deep Reinforcement Learning in POMDPs'. Together they form a unique fingerprint.

Cite this