Influence-aware memory architectures for deep reinforcement learning in POMDPs

Miguel Suau; Jinke  He; Elena Congeduti; Rolf Starre; Aleksander  Czechowski; Frans A. Oliehoek

doi:10.1007/s00521-022-07691-7

Influence-aware memory architectures for deep reinforcement learning in POMDPs

Miguel Suau ^*, Jinke He, Elena Congeduti, Rolf Starre, Aleksander Czechowski, Frans A. Oliehoek

^*Corresponding author for this work

Interactive Intelligence

Research output: Contribution to journal › Article › Scientific › peer-review

3 Citations (Scopus)

30 Downloads (Pure)

Abstract

Due to its perceptual limitations, an agent may have too little information about the environment to act optimally. In such cases, it is important to keep track of the action-observation history to uncover hidden state information. Recent deep reinforcement learning methods use recurrent neural networks (RNN) to memorize past observations. However, these models are expensive to train and have convergence difficulties, especially when dealing with high dimensional data. In this paper, we propose influence-aware memory, a theoretically inspired memory architecture that alleviates the training difficulties by restricting the input of the recurrent layers to those variables that influence the hidden state information. Moreover, as opposed to standard RNNs, in which every piece of information used for estimating Q values is inevitably fed back into the network for the next prediction, our model allows information to flow without being necessarily stored in the RNN’s internal memory. Results indicate that, by letting the recurrent layers focus on a small fraction of the observation variables while processing the rest of the information with a feedforward neural network, we can outperform standard recurrent architectures both in training speed and policy performance. This approach also reduces runtime and obtains better scores than methods that stack multiple observations to remove partial observability.

Original language	English
Number of pages	17
Journal	Neural Computing and Applications
DOIs	https://doi.org/10.1007/s00521-022-07691-7
Publication status	Published - 2022

Keywords

Conditional independence
Influence
Partial observability
Recurrent neural networks
Reinforcement learning

Access to Document

10.1007/s00521-022-07691-7

s00521-022-07691-7Final published version, 3.09 MBLicence: CC BY

Cite this

@article{7cf40b5373534e8dac41c6bbe052ecbe,

title = "Influence-aware memory architectures for deep reinforcement learning in POMDPs",

abstract = "Due to its perceptual limitations, an agent may have too little information about the environment to act optimally. In such cases, it is important to keep track of the action-observation history to uncover hidden state information. Recent deep reinforcement learning methods use recurrent neural networks (RNN) to memorize past observations. However, these models are expensive to train and have convergence difficulties, especially when dealing with high dimensional data. In this paper, we propose influence-aware memory, a theoretically inspired memory architecture that alleviates the training difficulties by restricting the input of the recurrent layers to those variables that influence the hidden state information. Moreover, as opposed to standard RNNs, in which every piece of information used for estimating Q values is inevitably fed back into the network for the next prediction, our model allows information to flow without being necessarily stored in the RNN{\textquoteright}s internal memory. Results indicate that, by letting the recurrent layers focus on a small fraction of the observation variables while processing the rest of the information with a feedforward neural network, we can outperform standard recurrent architectures both in training speed and policy performance. This approach also reduces runtime and obtains better scores than methods that stack multiple observations to remove partial observability.",

keywords = "Conditional independence, Influence, Partial observability, Recurrent neural networks, Reinforcement learning",

author = "Miguel Suau and Jinke He and Elena Congeduti and Rolf Starre and Aleksander Czechowski and Oliehoek, {Frans A.}",

year = "2022",

doi = "10.1007/s00521-022-07691-7",

language = "English",

journal = "Neural Computing and Applications",

issn = "0941-0643",

publisher = "Springer",

}

TY - JOUR

T1 - Influence-aware memory architectures for deep reinforcement learning in POMDPs

AU - Suau , Miguel

AU - He, Jinke

AU - Congeduti, Elena

AU - Starre, Rolf

AU - Czechowski, Aleksander

AU - Oliehoek, Frans A.

PY - 2022

Y1 - 2022

N2 - Due to its perceptual limitations, an agent may have too little information about the environment to act optimally. In such cases, it is important to keep track of the action-observation history to uncover hidden state information. Recent deep reinforcement learning methods use recurrent neural networks (RNN) to memorize past observations. However, these models are expensive to train and have convergence difficulties, especially when dealing with high dimensional data. In this paper, we propose influence-aware memory, a theoretically inspired memory architecture that alleviates the training difficulties by restricting the input of the recurrent layers to those variables that influence the hidden state information. Moreover, as opposed to standard RNNs, in which every piece of information used for estimating Q values is inevitably fed back into the network for the next prediction, our model allows information to flow without being necessarily stored in the RNN’s internal memory. Results indicate that, by letting the recurrent layers focus on a small fraction of the observation variables while processing the rest of the information with a feedforward neural network, we can outperform standard recurrent architectures both in training speed and policy performance. This approach also reduces runtime and obtains better scores than methods that stack multiple observations to remove partial observability.

AB - Due to its perceptual limitations, an agent may have too little information about the environment to act optimally. In such cases, it is important to keep track of the action-observation history to uncover hidden state information. Recent deep reinforcement learning methods use recurrent neural networks (RNN) to memorize past observations. However, these models are expensive to train and have convergence difficulties, especially when dealing with high dimensional data. In this paper, we propose influence-aware memory, a theoretically inspired memory architecture that alleviates the training difficulties by restricting the input of the recurrent layers to those variables that influence the hidden state information. Moreover, as opposed to standard RNNs, in which every piece of information used for estimating Q values is inevitably fed back into the network for the next prediction, our model allows information to flow without being necessarily stored in the RNN’s internal memory. Results indicate that, by letting the recurrent layers focus on a small fraction of the observation variables while processing the rest of the information with a feedforward neural network, we can outperform standard recurrent architectures both in training speed and policy performance. This approach also reduces runtime and obtains better scores than methods that stack multiple observations to remove partial observability.

KW - Conditional independence

KW - Influence

KW - Partial observability

KW - Recurrent neural networks

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=85137515783&partnerID=8YFLogxK

U2 - 10.1007/s00521-022-07691-7

DO - 10.1007/s00521-022-07691-7

M3 - Article

AN - SCOPUS:85137515783

SN - 0941-0643

JO - Neural Computing and Applications

JF - Neural Computing and Applications

ER -

Influence-aware memory architectures for deep reinforcement learning in POMDPs

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this