Making the right decisions when some of the state variables are hidden, requires removing the uncertainty about the current state of the environment. An agent receiving only partial observations needs to infer the true values of these hidden variables based on the history of observations. Recent deep reinforcement learning methods use recurrent models to keep track of past information. However, these models are expensive to train and have convergence difficulties, especially when dealing with high dimensional input spaces. Inspired by theory from influence-based abstraction, which asserts that in order to predict the hidden state variables we may only need to remember about a small subset of observation variables, we propose InfluenceNet. This new neural architecture tries to overcome the training difficulties in high dimensional problems by restricting the input that goes into the recurrent layers to those variables carrying important information about the non-Markovian dynamics. Results indicate that, by forcing the agent's internal memory to focus on this subset rather than on the full observation, we can outperform ordinary recurrent architectures. This approach also reduces training time and obtains better scores than methods that stack multiple observations to remove partial observability.
|Number of pages||11|
|Publication status||Accepted/In press - 11 Dec 2020|
|Event||Deep Reinforcement Learning Workshop, NeurIPs 2020 - |
Duration: 11 Dec 2020 → 11 Dec 2020
|Workshop||Deep Reinforcement Learning Workshop, NeurIPs 2020|
|Period||11/12/20 → 11/12/20|