Generalized Optimistic Q-Learning with Provable Efficiency

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

3 Citations (Scopus)
86 Downloads (Pure)


Reinforcement learning (RL), like any on-line learning method, inevitably faces the exploration-exploitation dilemma. When a learning algorithm requires as few data samples as possible, it is called sample efficient. The design of sample-efficient algorithms is an important area of research. Interestingly, all currently known provably efficient model-free RL algorithms utilize the same well-known principle of optimism in the face of uncertainty. We unite these existing algorithms into a single general model-free optimistic RL framework. We show how this facilitates the design of new optimistic model-free RL algorithms by simplifying the analysis of their efficiency. Finally, we propose one such new algorithm and demonstrate its performance in an experimental study.
Original languageEnglish
Title of host publicationProceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2020
EditorsBo An, Amal El Fallah Seghrouchni, Gita Sukthankar
PublisherInternational Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Number of pages9
ISBN (Electronic)978-1-4503-7518-4
Publication statusPublished - May 2020
EventAAMAS 2020: The 19th International Conference on Autonomous Agents and Multi-Agent Systems - Auckland, New Zealand
Duration: 9 May 202013 May 2020
Conference number: 19th

Publication series

NameProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
ISSN (Print)1548-8403
ISSN (Electronic)1558-2914


ConferenceAAMAS 2020
Country/TerritoryNew Zealand
OtherVirtual/online event due to COVID-19
Internet address

Bibliographical note

Virtual/online event due to COVID-19


  • Model-free learning
  • Reinforcement learning
  • Sample efficiency


Dive into the research topics of 'Generalized Optimistic Q-Learning with Provable Efficiency'. Together they form a unique fingerprint.

Cite this