Fine-tuning deep RL with gradient-free optimization

Tim de Bruin*, Jens Kober, Karl Tuyls, Robert Babuška

*Corresponding author for this work

Research output: Contribution to journalConference articleScientificpeer-review

2 Citations (Scopus)
196 Downloads (Pure)

Abstract

Deep reinforcement learning makes it possible to train control policies that map high-dimensional observations to actions. These methods typically use gradient-based optimization techniques to enable relatively efficient learning, but are notoriously sensitive to hyperparameter choices and do not have good convergence properties. Gradient-free optimization methods, such as evolutionary strategies, can offer a more stable alternative but tend to be much less sample efficient. In this work we propose a combination, using the relative strengths of both. We start with a gradient-based initial training phase, which is used to quickly learn both a state representation and an initial policy. This phase is followed by a gradient-free optimization of only the final action selection parameters. This enables the policy to improve in a stable manner to a performance level not obtained by gradient-based optimization alone, using many fewer trials than methods using only gradient-free optimization. We demonstrate the effectiveness of the method on two Atari games, a continuous control benchmark and the CarRacing-v0 benchmark. On the latter we surpass the best previously reported score while using significantly fewer episodes.

Original languageEnglish
Pages (from-to)8049-8056
JournalIFAC-PapersOnline
Volume53
Issue number2
DOIs
Publication statusPublished - 2020
Event21st IFAC World Congress 2020 - Berlin, Germany
Duration: 12 Jul 202017 Jul 2020

Keywords

  • Control
  • Deep learning
  • Neural networks
  • Optimization
  • Reinforcement learning

Fingerprint

Dive into the research topics of 'Fine-tuning deep RL with gradient-free optimization'. Together they form a unique fingerprint.

Cite this