Long-term values in markov decision processes, (Co)algebraically

Frank M.V. Feys; Helle Hvid Hansen; Lawrence S. Moss

doi:10.1007/978-3-030-00389-0_6

Long-term values in markov decision processes, (Co)algebraically

Frank M.V. Feys, Helle Hvid Hansen^*, Lawrence S. Moss

^*Corresponding author for this work

Energy and Industry

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

2 Citations (Scopus)

Abstract

This paper studies Markov decision processes (MDPs) from the categorical perspective of coalgebra and algebra. Probabilistic systems, similar to MDPs but without rewards, have been extensively studied, also coalgebraically, from the perspective of program semantics. In this paper, we focus on the role of MDPs as models in optimal planning, where the reward structure is central. The main contributions of this paper are (i) to give a coinductive explanation of policy improvement using a new proof principle, based on Banach’s Fixpoint Theorem, that we call contraction coinduction, and (ii) to show that the long-term value function of a policy with respect to discounted sums can be obtained via a generalized notion of corecursive algebra, which is designed to take boundedness into account. We also explore boundedness features of the Kantorovich lifting of the distribution monad to metric spaces.

Original language	English
Title of host publication	Coalgebraic Methods in Computer Science - 14th IFIP WG 1.3 International Workshop, CMCS 2018, Colocated with ETAPS 2018, Revised Selected Papers
Publisher	Springer
Pages	78-99
Number of pages	22
Volume	11202 LNCS
ISBN (Print)	9783030003883
DOIs	https://doi.org/10.1007/978-3-030-00389-0_6
Publication status	Published - 2018
Event	14th International Workshop on Coalgebraic Methods in Computer Science, CMCS 2018 Colocated with ETAPS 2018 - Thessaloniki, Greece Duration: 14 Apr 2018 → 15 Apr 2018

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11202 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	14th International Workshop on Coalgebraic Methods in Computer Science, CMCS 2018 Colocated with ETAPS 2018
Country/Territory	Greece
City	Thessaloniki
Period	14/04/18 → 15/04/18

Keywords

Algebra
Coalgebra
Corecursive algebra
Discounted sum
Fixpoint
Long-term value
Markov decision process
Metric space

Access to Document

10.1007/978-3-030-00389-0_6

Cite this

Feys, F. M. V., Hansen, H. H., & Moss, L. S. (2018). Long-term values in markov decision processes, (Co)algebraically. In Coalgebraic Methods in Computer Science - 14th IFIP WG 1.3 International Workshop, CMCS 2018, Colocated with ETAPS 2018, Revised Selected Papers (Vol. 11202 LNCS, pp. 78-99). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11202 LNCS). Springer. https://doi.org/10.1007/978-3-030-00389-0_6

Feys, Frank M.V. ; Hansen, Helle Hvid ; Moss, Lawrence S. / Long-term values in markov decision processes, (Co)algebraically. Coalgebraic Methods in Computer Science - 14th IFIP WG 1.3 International Workshop, CMCS 2018, Colocated with ETAPS 2018, Revised Selected Papers. Vol. 11202 LNCS Springer, 2018. pp. 78-99 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{3c60f47684bd47f08bada048d74f37c9,

title = "Long-term values in markov decision processes, (Co)algebraically",

abstract = "This paper studies Markov decision processes (MDPs) from the categorical perspective of coalgebra and algebra. Probabilistic systems, similar to MDPs but without rewards, have been extensively studied, also coalgebraically, from the perspective of program semantics. In this paper, we focus on the role of MDPs as models in optimal planning, where the reward structure is central. The main contributions of this paper are (i) to give a coinductive explanation of policy improvement using a new proof principle, based on Banach{\textquoteright}s Fixpoint Theorem, that we call contraction coinduction, and (ii) to show that the long-term value function of a policy with respect to discounted sums can be obtained via a generalized notion of corecursive algebra, which is designed to take boundedness into account. We also explore boundedness features of the Kantorovich lifting of the distribution monad to metric spaces.",

keywords = "Algebra, Coalgebra, Corecursive algebra, Discounted sum, Fixpoint, Long-term value, Markov decision process, Metric space",

author = "Feys, {Frank M.V.} and Hansen, {Helle Hvid} and Moss, {Lawrence S.}",

year = "2018",

doi = "10.1007/978-3-030-00389-0_6",

language = "English",

isbn = "9783030003883",

volume = "11202 LNCS",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer",

pages = "78--99",

booktitle = "Coalgebraic Methods in Computer Science - 14th IFIP WG 1.3 International Workshop, CMCS 2018, Colocated with ETAPS 2018, Revised Selected Papers",

note = "14th International Workshop on Coalgebraic Methods in Computer Science, CMCS 2018 Colocated with ETAPS 2018 ; Conference date: 14-04-2018 Through 15-04-2018",

}

Feys, FMV, Hansen, HH & Moss, LS 2018, Long-term values in markov decision processes, (Co)algebraically. in Coalgebraic Methods in Computer Science - 14th IFIP WG 1.3 International Workshop, CMCS 2018, Colocated with ETAPS 2018, Revised Selected Papers. vol. 11202 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11202 LNCS, Springer, pp. 78-99, 14th International Workshop on Coalgebraic Methods in Computer Science, CMCS 2018 Colocated with ETAPS 2018, Thessaloniki, Greece, 14/04/18. https://doi.org/10.1007/978-3-030-00389-0_6

Long-term values in markov decision processes, (Co)algebraically. / Feys, Frank M.V.; Hansen, Helle Hvid; Moss, Lawrence S.
Coalgebraic Methods in Computer Science - 14th IFIP WG 1.3 International Workshop, CMCS 2018, Colocated with ETAPS 2018, Revised Selected Papers. Vol. 11202 LNCS Springer, 2018. p. 78-99 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11202 LNCS).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Long-term values in markov decision processes, (Co)algebraically

AU - Feys, Frank M.V.

AU - Hansen, Helle Hvid

AU - Moss, Lawrence S.

PY - 2018

Y1 - 2018

N2 - This paper studies Markov decision processes (MDPs) from the categorical perspective of coalgebra and algebra. Probabilistic systems, similar to MDPs but without rewards, have been extensively studied, also coalgebraically, from the perspective of program semantics. In this paper, we focus on the role of MDPs as models in optimal planning, where the reward structure is central. The main contributions of this paper are (i) to give a coinductive explanation of policy improvement using a new proof principle, based on Banach’s Fixpoint Theorem, that we call contraction coinduction, and (ii) to show that the long-term value function of a policy with respect to discounted sums can be obtained via a generalized notion of corecursive algebra, which is designed to take boundedness into account. We also explore boundedness features of the Kantorovich lifting of the distribution monad to metric spaces.

AB - This paper studies Markov decision processes (MDPs) from the categorical perspective of coalgebra and algebra. Probabilistic systems, similar to MDPs but without rewards, have been extensively studied, also coalgebraically, from the perspective of program semantics. In this paper, we focus on the role of MDPs as models in optimal planning, where the reward structure is central. The main contributions of this paper are (i) to give a coinductive explanation of policy improvement using a new proof principle, based on Banach’s Fixpoint Theorem, that we call contraction coinduction, and (ii) to show that the long-term value function of a policy with respect to discounted sums can be obtained via a generalized notion of corecursive algebra, which is designed to take boundedness into account. We also explore boundedness features of the Kantorovich lifting of the distribution monad to metric spaces.

KW - Algebra

KW - Coalgebra

KW - Corecursive algebra

KW - Discounted sum

KW - Fixpoint

KW - Long-term value

KW - Markov decision process

KW - Metric space

UR - http://www.scopus.com/inward/record.url?scp=85057297578&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-00389-0_6

DO - 10.1007/978-3-030-00389-0_6

M3 - Conference contribution

AN - SCOPUS:85057297578

SN - 9783030003883

VL - 11202 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 78

EP - 99

BT - Coalgebraic Methods in Computer Science - 14th IFIP WG 1.3 International Workshop, CMCS 2018, Colocated with ETAPS 2018, Revised Selected Papers

PB - Springer

T2 - 14th International Workshop on Coalgebraic Methods in Computer Science, CMCS 2018 Colocated with ETAPS 2018

Y2 - 14 April 2018 through 15 April 2018

ER -

Feys FMV, Hansen HH, Moss LS. Long-term values in markov decision processes, (Co)algebraically. In Coalgebraic Methods in Computer Science - 14th IFIP WG 1.3 International Workshop, CMCS 2018, Colocated with ETAPS 2018, Revised Selected Papers. Vol. 11202 LNCS. Springer. 2018. p. 78-99. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-00389-0_6

Long-term values in markov decision processes, (Co)algebraically

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this