Long-term values in markov decision processes, (Co)algebraically

Frank M.V. Feys, Helle Hvid Hansen, Lawrence S. Moss

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

1 Citation (Scopus)

Abstract

This paper studies Markov decision processes (MDPs) from the categorical perspective of coalgebra and algebra. Probabilistic systems, similar to MDPs but without rewards, have been extensively studied, also coalgebraically, from the perspective of program semantics. In this paper, we focus on the role of MDPs as models in optimal planning, where the reward structure is central. The main contributions of this paper are (i) to give a coinductive explanation of policy improvement using a new proof principle, based on Banach’s Fixpoint Theorem, that we call contraction coinduction, and (ii) to show that the long-term value function of a policy with respect to discounted sums can be obtained via a generalized notion of corecursive algebra, which is designed to take boundedness into account. We also explore boundedness features of the Kantorovich lifting of the distribution monad to metric spaces.

Original languageEnglish
Title of host publicationCoalgebraic Methods in Computer Science - 14th IFIP WG 1.3 International Workshop, CMCS 2018, Colocated with ETAPS 2018, Revised Selected Papers
PublisherSpringer
Pages78-99
Number of pages22
Volume11202 LNCS
ISBN (Print)9783030003883
DOIs
Publication statusPublished - 2018
Event14th International Workshop on Coalgebraic Methods in Computer Science, CMCS 2018 Colocated with ETAPS 2018 - Thessaloniki, Greece
Duration: 14 Apr 201815 Apr 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11202 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th International Workshop on Coalgebraic Methods in Computer Science, CMCS 2018 Colocated with ETAPS 2018
CountryGreece
CityThessaloniki
Period14/04/1815/04/18

Keywords

  • Algebra
  • Coalgebra
  • Corecursive algebra
  • Discounted sum
  • Fixpoint
  • Long-term value
  • Markov decision process
  • Metric space

Fingerprint Dive into the research topics of 'Long-term values in markov decision processes, (Co)algebraically'. Together they form a unique fingerprint.

Cite this