Point-Based Value Iteration for Finite-Horizon POMDPs

Erwin Walraven; Matthijs T.J. Spaan

Point-Based Value Iteration for Finite-Horizon POMDPs

Erwin Walraven, Matthijs T.J. Spaan

Algorithmics

Research output: Contribution to journal › Article › Scientific › peer-review

127 Downloads (Pure)

Abstract

Partially Observable Markov Decision Processes (POMDPs) are a popular formalism for sequential decision making in partially observable environments. Since solving POMDPs to optimality is a difficult task, point-based value iteration methods are widely used. These methods compute an approximate POMDP solution, and in some cases they even provide guarantees on the solution quality, but these algorithms have been designed for problems with an infinite planning horizon. In this paper we discuss why state-of-the-art point-based algorithms cannot be easily applied to finite-horizon problems that do not include discounting. Subsequently, we present a general point-based value iteration algorithm for finite-horizon problems which provides solutions with guarantees on solution quality. Furthermore, we introduce two heuristics to reduce the number of belief points considered during execution, which lowers the computational requirements. In experiments we demonstrate that the algorithm is an effective method for solving finite-horizon POMDPs.

Original language	English
Pages (from-to)	307-341
Number of pages	35
Journal	The Journal of Artificial Intelligence Research
Volume	65
Publication status	Published - 2019

Access to Document

jair2019Final published version, 948 KB

https://www.jair.org/index.php/jair/article/view/11324

Cite this

@article{18992e92eb06445b80cc5cd1fc28afef,

title = "Point-Based Value Iteration for Finite-Horizon POMDPs",

abstract = "Partially Observable Markov Decision Processes (POMDPs) are a popular formalism for sequential decision making in partially observable environments. Since solving POMDPs to optimality is a difficult task, point-based value iteration methods are widely used. These methods compute an approximate POMDP solution, and in some cases they even provide guarantees on the solution quality, but these algorithms have been designed for problems with an infinite planning horizon. In this paper we discuss why state-of-the-art point-based algorithms cannot be easily applied to finite-horizon problems that do not include discounting. Subsequently, we present a general point-based value iteration algorithm for finite-horizon problems which provides solutions with guarantees on solution quality. Furthermore, we introduce two heuristics to reduce the number of belief points considered during execution, which lowers the computational requirements. In experiments we demonstrate that the algorithm is an effective method for solving finite-horizon POMDPs.",

author = "Erwin Walraven and Spaan, {Matthijs T.J.}",

year = "2019",

language = "English",

volume = "65",

pages = "307--341",

journal = "The Journal of Artificial Intelligence Research",

issn = "1076-9757",

publisher = "Morgan Kaufmann Publishers",

}

TY - JOUR

T1 - Point-Based Value Iteration for Finite-Horizon POMDPs

AU - Walraven, Erwin

AU - Spaan, Matthijs T.J.

PY - 2019

Y1 - 2019

N2 - Partially Observable Markov Decision Processes (POMDPs) are a popular formalism for sequential decision making in partially observable environments. Since solving POMDPs to optimality is a difficult task, point-based value iteration methods are widely used. These methods compute an approximate POMDP solution, and in some cases they even provide guarantees on the solution quality, but these algorithms have been designed for problems with an infinite planning horizon. In this paper we discuss why state-of-the-art point-based algorithms cannot be easily applied to finite-horizon problems that do not include discounting. Subsequently, we present a general point-based value iteration algorithm for finite-horizon problems which provides solutions with guarantees on solution quality. Furthermore, we introduce two heuristics to reduce the number of belief points considered during execution, which lowers the computational requirements. In experiments we demonstrate that the algorithm is an effective method for solving finite-horizon POMDPs.

AB - Partially Observable Markov Decision Processes (POMDPs) are a popular formalism for sequential decision making in partially observable environments. Since solving POMDPs to optimality is a difficult task, point-based value iteration methods are widely used. These methods compute an approximate POMDP solution, and in some cases they even provide guarantees on the solution quality, but these algorithms have been designed for problems with an infinite planning horizon. In this paper we discuss why state-of-the-art point-based algorithms cannot be easily applied to finite-horizon problems that do not include discounting. Subsequently, we present a general point-based value iteration algorithm for finite-horizon problems which provides solutions with guarantees on solution quality. Furthermore, we introduce two heuristics to reduce the number of belief points considered during execution, which lowers the computational requirements. In experiments we demonstrate that the algorithm is an effective method for solving finite-horizon POMDPs.

M3 - Article

SN - 1076-9757

VL - 65

SP - 307

EP - 341

JO - The Journal of Artificial Intelligence Research

JF - The Journal of Artificial Intelligence Research

ER -

Point-Based Value Iteration for Finite-Horizon POMDPs

Abstract

Access to Document

Fingerprint

Cite this