Maximizing Information Gain in Partially Observable Environments via Prediction Rewards

Yash Satsangi, Sungsu Lim, Shimon Whiteson, Frans A. Oliehoek, Martha White

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

1 Citation (Scopus)
4 Downloads (Pure)

Abstract

Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty. For example, the reward can be the negative entropy of the agent's belief over an unknown (or hidden) variable. Typically, the rewards of an RL agent are defined as a function of the state-action pairs and not as a function of the belief of the agent; this hinders the direct application of deep RL methods for such tasks. This paper tackles the challenge of using belief-based rewards for a deep RL agent, by offering a simple insight that maximizing any convex function of the belief of the agent can be approximated by instead maximizing a prediction reward: a reward based on prediction accuracy. In particular, we derive the exact error between negative entropy and the expected prediction reward. This insight provides theoretical motivation for several fields using prediction rewards---namely visual attention, question answering systems, and intrinsic motivation---and highlights their connection to the usually distinct fields of active perception, active sensing, and sensor placement. Based on this insight we present deep anticipatory networks (DANs), which enables an agent to take actions to reduce its uncertainty without performing explicit belief inference. We present two applications of DANs: building a sensor selection system for tracking people in a shopping mall and learning discrete models of attention on fashion MNIST and MNIST digit classification.
Original languageEnglish
Title of host publicationProceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2020
EditorsBo An, Amal El Fallah Seghrouchni, Gita Sukthankar
Place of PublicationRichland, SC
PublisherInternational Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Pages1215–1223
Number of pages9
ISBN (Electronic)9781450375184
ISBN (Print)9781450375184
Publication statusPublished - 9 May 2020
EventAAMAS 2020: The 19th International Conference on Autonomous Agents and Multi-Agent Systems - Auckland, New Zealand
Duration: 9 May 202013 May 2020
Conference number: 19th
https://aamas2020.conference.auckland.ac.nz

Publication series

NameProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Volume2020-May
ISSN (Print)1548-8403
ISSN (Electronic)1558-2914

Conference

ConferenceAAMAS 2020
CountryNew Zealand
CityAuckland
Period9/05/2013/05/20
OtherVirtual/online event due to COVID-19
Internet address

Keywords

  • Information gain
  • Partially observability
  • Reinforcement learning

Fingerprint Dive into the research topics of 'Maximizing Information Gain in Partially Observable Environments via Prediction Rewards'. Together they form a unique fingerprint.

Cite this