The cross-entropy method for policy search in decentralized POMDPs

Frans A. Oliehoek; Julian F.P. Kooij; Nikos Vlassis

The cross-entropy method for policy search in decentralized POMDPs

Frans A. Oliehoek, Julian F.P. Kooij, Nikos Vlassis

Research output: Contribution to journal › Article › Scientific › peer-review

Abstract

Decentralized POMDPs (Dec-POMDPs) are becoming increasingly popular as models for multiagent planning under uncertainty, but solving a Dec-POMDP exactly is known to be an intractable combinatorial optimization problem. In this paper we apply the Cross-Entropy (CE) method, a recently introduced method for combinatorial optimization, to Dec-POMDPs, resulting in a randomized (sampling-based) algorithm for approximately solving Dec-POMDPs. This algorithm operates by sampling pure policies from an appropriately parametrized stochastic policy, and then evaluates these policies either exactly or approximately in order to define the next stochastic policy to sample from, and so on until convergence. Experimental results demonstrate that the CE method can search huge spaces efficiently, supporting our claim that combinatorial optimization methods can bring leverage to the approximate solution of Dec-POMDPs.

Original language	English
Pages (from-to)	341-357
Number of pages	17
Journal	Informatica (Ljubljana)
Volume	32
Issue number	4
Publication status	Published - 2008
Externally published	Yes

Keywords

Combinatorial optimization
Decentralized POMDPs
Multiagent planning

Cite this

@article{5bcd2e92634749eaa5c476489c2a4e68,

title = "The cross-entropy method for policy search in decentralized POMDPs",

abstract = "Decentralized POMDPs (Dec-POMDPs) are becoming increasingly popular as models for multiagent planning under uncertainty, but solving a Dec-POMDP exactly is known to be an intractable combinatorial optimization problem. In this paper we apply the Cross-Entropy (CE) method, a recently introduced method for combinatorial optimization, to Dec-POMDPs, resulting in a randomized (sampling-based) algorithm for approximately solving Dec-POMDPs. This algorithm operates by sampling pure policies from an appropriately parametrized stochastic policy, and then evaluates these policies either exactly or approximately in order to define the next stochastic policy to sample from, and so on until convergence. Experimental results demonstrate that the CE method can search huge spaces efficiently, supporting our claim that combinatorial optimization methods can bring leverage to the approximate solution of Dec-POMDPs.",

keywords = "Combinatorial optimization, Decentralized POMDPs, Multiagent planning",

author = "Oliehoek, {Frans A.} and Kooij, {Julian F.P.} and Nikos Vlassis",

year = "2008",

language = "English",

volume = "32",

pages = "341--357",

journal = "Informatica (Ljubljana)",

issn = "0350-5596",

publisher = "Slovene Society Informatika",

number = "4",

}

TY - JOUR

T1 - The cross-entropy method for policy search in decentralized POMDPs

AU - Oliehoek, Frans A.

AU - Kooij, Julian F.P.

AU - Vlassis, Nikos

PY - 2008

Y1 - 2008

N2 - Decentralized POMDPs (Dec-POMDPs) are becoming increasingly popular as models for multiagent planning under uncertainty, but solving a Dec-POMDP exactly is known to be an intractable combinatorial optimization problem. In this paper we apply the Cross-Entropy (CE) method, a recently introduced method for combinatorial optimization, to Dec-POMDPs, resulting in a randomized (sampling-based) algorithm for approximately solving Dec-POMDPs. This algorithm operates by sampling pure policies from an appropriately parametrized stochastic policy, and then evaluates these policies either exactly or approximately in order to define the next stochastic policy to sample from, and so on until convergence. Experimental results demonstrate that the CE method can search huge spaces efficiently, supporting our claim that combinatorial optimization methods can bring leverage to the approximate solution of Dec-POMDPs.

AB - Decentralized POMDPs (Dec-POMDPs) are becoming increasingly popular as models for multiagent planning under uncertainty, but solving a Dec-POMDP exactly is known to be an intractable combinatorial optimization problem. In this paper we apply the Cross-Entropy (CE) method, a recently introduced method for combinatorial optimization, to Dec-POMDPs, resulting in a randomized (sampling-based) algorithm for approximately solving Dec-POMDPs. This algorithm operates by sampling pure policies from an appropriately parametrized stochastic policy, and then evaluates these policies either exactly or approximately in order to define the next stochastic policy to sample from, and so on until convergence. Experimental results demonstrate that the CE method can search huge spaces efficiently, supporting our claim that combinatorial optimization methods can bring leverage to the approximate solution of Dec-POMDPs.

KW - Combinatorial optimization

KW - Decentralized POMDPs

KW - Multiagent planning

UR - http://www.scopus.com/inward/record.url?scp=57349184659&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:57349184659

SN - 0350-5596

VL - 32

SP - 341

EP - 357

JO - Informatica (Ljubljana)

JF - Informatica (Ljubljana)

IS - 4

ER -

The cross-entropy method for policy search in decentralized POMDPs

Abstract

Keywords

Other files and links

Fingerprint

Cite this