Generalized Optimistic Q-Learning with Provable Efficiency

Activity: Talk or presentationTalk or presentation at a conference


Reinforcement learning (RL), like any on-line learning method, inevitably faces the exploration-exploitation dilemma. When a learning algorithm requires as few data samples as possible, it is called sample efficient. The design of sample-efficient algorithms is an important area of research. Interestingly, all currently known provably efficient model-free RL algorithms utilize the same well-known principle of optimism in the face of uncertainty. We unite these existing algorithms into a single general model-free optimistic RL framework. We show how this facilitates the design of new optimistic model-free RL algorithms by simplifying the analysis of their efficiency. Finally, we propose one such new algorithm and demonstrate its performance in an experimental study.
Period11 May 2020
Event titleAAMAS 2020: The 19th International Conference on Autonomous Agents and Multi-Agent Systems
Event typeConference
Conference number19th
LocationAuckland, New Zealand
Degree of RecognitionInternational