Abstract
Reinforcement learning (RL), like any on-line learning method, inevitably faces the exploration-exploitation dilemma. When a learning algorithm requires as few data samples as possible, it is called sample efficient. The design of sample-efficient algorithms is an important area of research. Interestingly, all currently known provably efficient model-free RL algorithms utilize the same well-known principle of optimism in the face of uncertainty. We unite these existing algorithms into a single general model-free optimistic RL framework. We show how this facilitates the design of new optimistic model-free RL algorithms by simplifying the analysis of their efficiency. Finally, we propose one such new algorithm and demonstrate its performance in an experimental study.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2020 |
| Editors | Bo An, Amal El Fallah Seghrouchni, Gita Sukthankar |
| Publisher | International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS) |
| Pages | 913-921 |
| Number of pages | 9 |
| ISBN (Electronic) | 978-1-4503-7518-4 |
| Publication status | Published - May 2020 |
| Event | AAMAS 2020: The 19th International Conference on Autonomous Agents and Multi-Agent Systems - Auckland, New Zealand Duration: 9 May 2020 → 13 May 2020 Conference number: 19th https://aamas2020.conference.auckland.ac.nz |
Publication series
| Name | Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS |
|---|---|
| Volume | 2020-May |
| ISSN (Print) | 1548-8403 |
| ISSN (Electronic) | 1558-2914 |
Conference
| Conference | AAMAS 2020 |
|---|---|
| Country/Territory | New Zealand |
| City | Auckland |
| Period | 9/05/20 → 13/05/20 |
| Other | Virtual/online event due to COVID-19 |
| Internet address |
Bibliographical note
Virtual/online event due to COVID-19Keywords
- Model-free learning
- Reinforcement learning
- Sample efficiency
Fingerprint
Dive into the research topics of 'Generalized Optimistic Q-Learning with Provable Efficiency'. Together they form a unique fingerprint.Research output
- 3 Citations
- 1 Dissertation (TU Delft)
-
Generalized Models of Sequential Decision-Making under Uncertainty
Neustroev, G., 2022, 219 p.Research output: Thesis › Dissertation (TU Delft)
Open AccessFile398 Downloads (Pure)
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver