Improving Offline Value-Function Approximations for POMDPs by Reducing Discount Factors

Yi Chun Chen, Mykel J. Kochenderfer, Matthijs T.J. Spaan

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

1 Citation (Scopus)

Abstract

A common solution criterion for partially observable Markov decision processes (POMDPs) is to maximize the expected sum of exponentially discounted rewards, for which a variety of approximate methods have been proposed. Those that plan in the belief space typically provide tighter performance guarantees, but those that plan over the state space (e.g., QMDP and FIB) often require much less memory and computation. This paper presents an encouraging result that shows that reducing the discount factor while planning in the state space can actually improve performance significantly when evaluated on the original problem. This phenomenon is confirmed by both a theoretical analysis as well as a series of empirical studies on benchmark problems. As predicted by the theory and confirmed empirically, the phenomenon is most prominent when the observation model is noisy or rewards are sparse.

Original languageEnglish
Title of host publication2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018
EditorsCarlos Balaguer, Hajime Asama, Danica Kragic, Kevin Lynch
Place of PublicationPiscataway, NJ, USA
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages3531-3536
Number of pages6
ISBN (Electronic)978-1-5386-8094-0
DOIs
Publication statusPublished - 2018
Event2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018 - Madrid, Spain
Duration: 1 Oct 20185 Oct 2018

Conference

Conference2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018
CountrySpain
CityMadrid
Period1/10/185/10/18

Fingerprint

Dive into the research topics of 'Improving Offline Value-Function Approximations for POMDPs by Reducing Discount Factors'. Together they form a unique fingerprint.

Cite this