Transient non-stationarity and generalisation in deep reinforcement learning

Maximilian Igl; Gregory Farquhar; Jelena Luketina; Wendelin Böhmer; Shimon Whiteson

Transient non-stationarity and generalisation in deep reinforcement learning

Maximilian Igl, Gregory Farquhar, Jelena Luketina, Wendelin Böhmer, Shimon Whiteson

Algorithmics

Research output: Contribution to conference › Paper › peer-review

53 Downloads (Pure)

Abstract

Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments. For example, most RL algorithms collect new data throughout training, using a non-stationary behaviour policy. Due to the transience of this non-stationarity, it is often not explicitly addressed in deep RL and a single neural network is continually updated. However, we find evidence that neural networks exhibit a memory effect, where these transient non-stationarities can permanently impact the latent representation and adversely affect generalisation performance. Consequently, to improve generalisation of deep RL agents, we propose Iterated Relearning (ITER). ITER augments standard RL training by repeated knowledge transfer of the current policy into a freshly initialised network, which thereby experiences less non-stationarity during training. Experimentally, we show that ITER improves performance on the challenging generalisation benchmarks ProcGen and Multiroom.

Original language	English
Number of pages	16
Publication status	Published - 2021
Event	9th International Conference on Learning Representations - Virtual Conference Duration: 3 May 2020 → 7 May 2020 Conference number: 9 https://iclr.cc/Conferences/2021/Dates

Conference

Conference	9th International Conference on Learning Representations
Abbreviated title	ICLR 2021
Period	3/05/20 → 7/05/20
Internet address	https://iclr.cc/Conferences/2021/Dates

Keywords

Reinforcement Learning
Generalization

Access to Document

transient_non_stationarity_and_generalisation_in_deep_reinforcement_learningFinal published version, 3.33 MB

https://openreview.net/forum?id=Qun8fv4qSby

Cite this

@conference{efa260acbb094c9e989e6a50beed7e4d,

title = "Transient non-stationarity and generalisation in deep reinforcement learning",

abstract = "Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments. For example, most RL algorithms collect new data throughout training, using a non-stationary behaviour policy. Due to the transience of this non-stationarity, it is often not explicitly addressed in deep RL and a single neural network is continually updated. However, we find evidence that neural networks exhibit a memory effect, where these transient non-stationarities can permanently impact the latent representation and adversely affect generalisation performance. Consequently, to improve generalisation of deep RL agents, we propose Iterated Relearning (ITER). ITER augments standard RL training by repeated knowledge transfer of the current policy into a freshly initialised network, which thereby experiences less non-stationarity during training. Experimentally, we show that ITER improves performance on the challenging generalisation benchmarks ProcGen and Multiroom.",

keywords = "Reinforcement Learning, Generalization",

author = "Maximilian Igl and Gregory Farquhar and Jelena Luketina and Wendelin B{\"o}hmer and Shimon Whiteson",

year = "2021",

language = "English",

note = "9th International Conference on Learning Representations, ICLR 2021 ; Conference date: 03-05-2020 Through 07-05-2020",

url = "https://iclr.cc/Conferences/2021/Dates",

}

TY - CONF

T1 - Transient non-stationarity and generalisation in deep reinforcement learning

AU - Igl, Maximilian

AU - Farquhar, Gregory

AU - Luketina, Jelena

AU - Böhmer, Wendelin

AU - Whiteson, Shimon

N1 - Conference code: 9

PY - 2021

Y1 - 2021

N2 - Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments. For example, most RL algorithms collect new data throughout training, using a non-stationary behaviour policy. Due to the transience of this non-stationarity, it is often not explicitly addressed in deep RL and a single neural network is continually updated. However, we find evidence that neural networks exhibit a memory effect, where these transient non-stationarities can permanently impact the latent representation and adversely affect generalisation performance. Consequently, to improve generalisation of deep RL agents, we propose Iterated Relearning (ITER). ITER augments standard RL training by repeated knowledge transfer of the current policy into a freshly initialised network, which thereby experiences less non-stationarity during training. Experimentally, we show that ITER improves performance on the challenging generalisation benchmarks ProcGen and Multiroom.

AB - Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments. For example, most RL algorithms collect new data throughout training, using a non-stationary behaviour policy. Due to the transience of this non-stationarity, it is often not explicitly addressed in deep RL and a single neural network is continually updated. However, we find evidence that neural networks exhibit a memory effect, where these transient non-stationarities can permanently impact the latent representation and adversely affect generalisation performance. Consequently, to improve generalisation of deep RL agents, we propose Iterated Relearning (ITER). ITER augments standard RL training by repeated knowledge transfer of the current policy into a freshly initialised network, which thereby experiences less non-stationarity during training. Experimentally, we show that ITER improves performance on the challenging generalisation benchmarks ProcGen and Multiroom.

KW - Reinforcement Learning

KW - Generalization

M3 - Paper

T2 - 9th International Conference on Learning Representations

Y2 - 3 May 2020 through 7 May 2020

ER -

Transient non-stationarity and generalisation in deep reinforcement learning

Abstract

Conference

Keywords

Access to Document

Fingerprint

Cite this