TY - JOUR
T1 - Influence-aware memory architectures for deep reinforcement learning in POMDPs
AU - Suau , Miguel
AU - He, Jinke
AU - Congeduti, Elena
AU - Starre, Rolf
AU - Czechowski, Aleksander
AU - Oliehoek, Frans A.
PY - 2022
Y1 - 2022
N2 - Due to its perceptual limitations, an agent may have too little information about the environment to act optimally. In such cases, it is important to keep track of the action-observation history to uncover hidden state information. Recent deep reinforcement learning methods use recurrent neural networks (RNN) to memorize past observations. However, these models are expensive to train and have convergence difficulties, especially when dealing with high dimensional data. In this paper, we propose influence-aware memory, a theoretically inspired memory architecture that alleviates the training difficulties by restricting the input of the recurrent layers to those variables that influence the hidden state information. Moreover, as opposed to standard RNNs, in which every piece of information used for estimating Q values is inevitably fed back into the network for the next prediction, our model allows information to flow without being necessarily stored in the RNN’s internal memory. Results indicate that, by letting the recurrent layers focus on a small fraction of the observation variables while processing the rest of the information with a feedforward neural network, we can outperform standard recurrent architectures both in training speed and policy performance. This approach also reduces runtime and obtains better scores than methods that stack multiple observations to remove partial observability.
AB - Due to its perceptual limitations, an agent may have too little information about the environment to act optimally. In such cases, it is important to keep track of the action-observation history to uncover hidden state information. Recent deep reinforcement learning methods use recurrent neural networks (RNN) to memorize past observations. However, these models are expensive to train and have convergence difficulties, especially when dealing with high dimensional data. In this paper, we propose influence-aware memory, a theoretically inspired memory architecture that alleviates the training difficulties by restricting the input of the recurrent layers to those variables that influence the hidden state information. Moreover, as opposed to standard RNNs, in which every piece of information used for estimating Q values is inevitably fed back into the network for the next prediction, our model allows information to flow without being necessarily stored in the RNN’s internal memory. Results indicate that, by letting the recurrent layers focus on a small fraction of the observation variables while processing the rest of the information with a feedforward neural network, we can outperform standard recurrent architectures both in training speed and policy performance. This approach also reduces runtime and obtains better scores than methods that stack multiple observations to remove partial observability.
KW - Conditional independence
KW - Influence
KW - Partial observability
KW - Recurrent neural networks
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85137515783&partnerID=8YFLogxK
U2 - 10.1007/s00521-022-07691-7
DO - 10.1007/s00521-022-07691-7
M3 - Article
AN - SCOPUS:85137515783
SN - 0941-0643
JO - Neural Computing and Applications
JF - Neural Computing and Applications
ER -