Discovery of Optimal Solution Horizons in Non-Stationary Markov Decision Processes with Unbounded Rewards

Greg Neustroev; Mathijs de Weerdt; Remco Verzijlbergh

Discovery of Optimal Solution Horizons in Non-Stationary Markov Decision Processes with Unbounded Rewards

Greg Neustroev, Mathijs de Weerdt, Remco Verzijlbergh

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

82 Downloads (Pure)

Abstract

Infinite-horizon non-stationary Markov decision processes provide a general framework to model many real-life decision-making problems, e.g., planning equipment maintenance. Unfortunately, these problems are notoriously difficult to solve, due to their infinite dimensionality. Often, only the optimality of the initial action is of importance to the decision-maker: once it has been identified, the procedure can be repeated to generate a plan of arbitrary length. The optimal initial action can be identified by finding a time horizon so long that data beyond it has no effect on the initial decision. This horizon is known as a solution horizon and can be discovered by considering a series of truncations of the problem until a stopping rule guaranteeing initial decision optimality is satisfied. We present such a stopping rule for problems with unbounded rewards. Given a candidate policy, the rule uses a mathematical program that searches for other possibly optimal initial actions within the space of feasible truncations. If no better action can be found, the candidate action is deemed optimal. Our rule runs faster than comparable rules and discovers shorter, more efficient solution horizons.

Original language	English
Title of host publication	Proceedings of the 29th International Conference on Automated Planning and Scheduling, ICAPS 2019
Editors	J. Benton, Nir Lipovetzky, Eva Onaindia, David E. Smith, Siddharth Srivastava
Publisher	Association for the Advancement of Artificial Intelligence (AAAI)
Pages	292-300
Number of pages	9
Volume	29
ISBN (Electronic)	9781577358077
Publication status	Published - 2019
Event	29th International Conference on Automated Planning and Scheduling - Berkeley, United States Duration: 11 Jul 2019 → 15 Jul 2019 Conference number: 29

Publication series

Name	Proceedings International Conference on Automated Planning and Scheduling, ICAPS
ISSN (Print)	2334-0835
ISSN (Electronic)	2334-0843

Conference

Conference	29th International Conference on Automated Planning and Scheduling
Abbreviated title	ICAPS
Country/Territory	United States
City	Berkeley
Period	11/07/19 → 15/07/19

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care

Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Access to Document

3491-Article Text-6540-1-10-20190619Final published version, 521 KB

https://aaai.org/ojs/index.php/ICAPS/article/view/3491

Cite this

Neustroev, G., de Weerdt, M., & Verzijlbergh, R. (2019). Discovery of Optimal Solution Horizons in Non-Stationary Markov Decision Processes with Unbounded Rewards. In J. Benton, N. Lipovetzky, E. Onaindia, D. E. Smith, & S. Srivastava (Eds.), Proceedings of the 29th International Conference on Automated Planning and Scheduling, ICAPS 2019 (Vol. 29, pp. 292-300). (Proceedings International Conference on Automated Planning and Scheduling, ICAPS). Association for the Advancement of Artificial Intelligence (AAAI). https://aaai.org/ojs/index.php/ICAPS/article/view/3491

Neustroev, Greg ; de Weerdt, Mathijs ; Verzijlbergh, Remco. / Discovery of Optimal Solution Horizons in Non-Stationary Markov Decision Processes with Unbounded Rewards. Proceedings of the 29th International Conference on Automated Planning and Scheduling, ICAPS 2019. editor / J. Benton ; Nir Lipovetzky ; Eva Onaindia ; David E. Smith ; Siddharth Srivastava. Vol. 29 Association for the Advancement of Artificial Intelligence (AAAI), 2019. pp. 292-300 (Proceedings International Conference on Automated Planning and Scheduling, ICAPS).

@inproceedings{b53d955e75d6481f932e3bd98cc5439b,

title = "Discovery of Optimal Solution Horizons in Non-Stationary Markov Decision Processes with Unbounded Rewards",

abstract = "Infinite-horizon non-stationary Markov decision processes provide a general framework to model many real-life decision-making problems, e.g., planning equipment maintenance. Unfortunately, these problems are notoriously difficult to solve, due to their infinite dimensionality. Often, only the optimality of the initial action is of importance to the decision-maker: once it has been identified, the procedure can be repeated to generate a plan of arbitrary length. The optimal initial action can be identified by finding a time horizon so long that data beyond it has no effect on the initial decision. This horizon is known as a solution horizon and can be discovered by considering a series of truncations of the problem until a stopping rule guaranteeing initial decision optimality is satisfied. We present such a stopping rule for problems with unbounded rewards. Given a candidate policy, the rule uses a mathematical program that searches for other possibly optimal initial actions within the space of feasible truncations. If no better action can be found, the candidate action is deemed optimal. Our rule runs faster than comparable rules and discovers shorter, more efficient solution horizons.",

author = "Greg Neustroev and {de Weerdt}, Mathijs and Remco Verzijlbergh",

note = "Green Open Access added to TU Delft Institutional Repository {\textquoteleft}You share, we take care!{\textquoteright} – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.; 29th International Conference on Automated Planning and Scheduling, ICAPS ; Conference date: 11-07-2019 Through 15-07-2019",

year = "2019",

language = "English",

volume = "29",

series = "Proceedings International Conference on Automated Planning and Scheduling, ICAPS",

publisher = "Association for the Advancement of Artificial Intelligence (AAAI)",

pages = "292--300",

editor = "J. Benton and Nir Lipovetzky and Eva Onaindia and Smith, {David E.} and Siddharth Srivastava",

booktitle = "Proceedings of the 29th International Conference on Automated Planning and Scheduling, ICAPS 2019",

}

Neustroev, G , de Weerdt, M & Verzijlbergh, R 2019, Discovery of Optimal Solution Horizons in Non-Stationary Markov Decision Processes with Unbounded Rewards. in J Benton, N Lipovetzky, E Onaindia, DE Smith & S Srivastava (eds), Proceedings of the 29th International Conference on Automated Planning and Scheduling, ICAPS 2019. vol. 29, Proceedings International Conference on Automated Planning and Scheduling, ICAPS, Association for the Advancement of Artificial Intelligence (AAAI), pp. 292-300, 29th International Conference on Automated Planning and Scheduling, Berkeley, California, United States, 11/07/19. <https://aaai.org/ojs/index.php/ICAPS/article/view/3491>

Discovery of Optimal Solution Horizons in Non-Stationary Markov Decision Processes with Unbounded Rewards. / Neustroev, Greg ; de Weerdt, Mathijs ; Verzijlbergh, Remco.
Proceedings of the 29th International Conference on Automated Planning and Scheduling, ICAPS 2019. ed. / J. Benton; Nir Lipovetzky; Eva Onaindia; David E. Smith; Siddharth Srivastava. Vol. 29 Association for the Advancement of Artificial Intelligence (AAAI), 2019. p. 292-300 (Proceedings International Conference on Automated Planning and Scheduling, ICAPS).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Discovery of Optimal Solution Horizons in Non-Stationary Markov Decision Processes with Unbounded Rewards

AU - Neustroev, Greg

AU - de Weerdt, Mathijs

AU - Verzijlbergh, Remco

N1 - Conference code: 29

PY - 2019

Y1 - 2019

N2 - Infinite-horizon non-stationary Markov decision processes provide a general framework to model many real-life decision-making problems, e.g., planning equipment maintenance. Unfortunately, these problems are notoriously difficult to solve, due to their infinite dimensionality. Often, only the optimality of the initial action is of importance to the decision-maker: once it has been identified, the procedure can be repeated to generate a plan of arbitrary length. The optimal initial action can be identified by finding a time horizon so long that data beyond it has no effect on the initial decision. This horizon is known as a solution horizon and can be discovered by considering a series of truncations of the problem until a stopping rule guaranteeing initial decision optimality is satisfied. We present such a stopping rule for problems with unbounded rewards. Given a candidate policy, the rule uses a mathematical program that searches for other possibly optimal initial actions within the space of feasible truncations. If no better action can be found, the candidate action is deemed optimal. Our rule runs faster than comparable rules and discovers shorter, more efficient solution horizons.

AB - Infinite-horizon non-stationary Markov decision processes provide a general framework to model many real-life decision-making problems, e.g., planning equipment maintenance. Unfortunately, these problems are notoriously difficult to solve, due to their infinite dimensionality. Often, only the optimality of the initial action is of importance to the decision-maker: once it has been identified, the procedure can be repeated to generate a plan of arbitrary length. The optimal initial action can be identified by finding a time horizon so long that data beyond it has no effect on the initial decision. This horizon is known as a solution horizon and can be discovered by considering a series of truncations of the problem until a stopping rule guaranteeing initial decision optimality is satisfied. We present such a stopping rule for problems with unbounded rewards. Given a candidate policy, the rule uses a mathematical program that searches for other possibly optimal initial actions within the space of feasible truncations. If no better action can be found, the candidate action is deemed optimal. Our rule runs faster than comparable rules and discovers shorter, more efficient solution horizons.

UR - http://www.scopus.com/inward/record.url?scp=85085621868&partnerID=8YFLogxK

M3 - Conference contribution

VL - 29

T3 - Proceedings International Conference on Automated Planning and Scheduling, ICAPS

SP - 292

EP - 300

BT - Proceedings of the 29th International Conference on Automated Planning and Scheduling, ICAPS 2019

A2 - Benton, J.

A2 - Lipovetzky, Nir

A2 - Onaindia, Eva

A2 - Smith, David E.

A2 - Srivastava, Siddharth

PB - Association for the Advancement of Artificial Intelligence (AAAI)

T2 - 29th International Conference on Automated Planning and Scheduling

Y2 - 11 July 2019 through 15 July 2019

ER -

Neustroev G , de Weerdt M , Verzijlbergh R. Discovery of Optimal Solution Horizons in Non-Stationary Markov Decision Processes with Unbounded Rewards. In Benton J, Lipovetzky N, Onaindia E, Smith DE, Srivastava S, editors, Proceedings of the 29th International Conference on Automated Planning and Scheduling, ICAPS 2019. Vol. 29. Association for the Advancement of Artificial Intelligence (AAAI). 2019. p. 292-300. (Proceedings International Conference on Automated Planning and Scheduling, ICAPS).

Discovery of Optimal Solution Horizons in Non-Stationary Markov Decision Processes with Unbounded Rewards

Abstract

Publication series

Conference

Bibliographical note

Access to Document

Other files and links

Fingerprint

Generalized Models of Sequential Decision-Making under Uncertainty

Cite this