The Intersection of Planning and Learning

T.M. Moerland

doi:10.4233/uuid:5437884e-0078-4b36-b2c7-c6edfea3b418

The Intersection of Planning and Learning

T.M. Moerland

Interactive Intelligence

Research output: Thesis › Dissertation (TU Delft)

455 Downloads (Pure)

Abstract

Intelligent sequential decision making is a key challenge in artificial intelligence. The problem, commonly formalized as a Markov Decision Process, is studied in two different research communities: planning and reinforcement learning. Departing from a fundamentally different assumption about the type of access to the environment, both research fields have developed their own solution approaches and conventions. The combination of both fields, known as model-based reinforcement learning, has recently shown state-of-the-art results, for example defeating human experts in classic board games like Chess and Go. Nevertheless, literature lacks an integrated view on 1) the similarities between planning and learning, and 2) the possible combinations of both. This dissertation aims to fill this gap. The first half of the book presents a conceptual answer to both questions. We first present a framework that disentangles the common algorithmic space of both fields, showing that they essentially face the same algorithmic design decisions. Moreover, we also present an overview of the different ways in which planning and learning can be combined in one algorithm. The second half of the dissertation provides experimental illustration of these ideas. We present several new combinations of planning and learning, such as a flexible method to learn stochastic dynamics models with neural networks, an extension of a successful planning-learning algorithm (AlphaZero) to deal with continuous action spaces, and a study of the empirical trade-off between planning and learning. Finally, we also illustrate the commonalities between both fields, by designing a new algorithm in one field based on inspiration from the other field. We conclude the thesis with an outlook for the planning-learning field as a whole. Altogether, the dissertation provides a broad theoretical and empirical view on the combination of planning and learning, which promises to be an important frontier in artificial intelligence research in the coming years.

Original language	English
Qualification	Doctor of Philosophy
Awarding Institution	Delft University of Technology
Supervisors/Advisors	Jonker, C.M., Supervisor Plaat, Aske, Supervisor, External person Broekens, D.J., Advisor
Award date	10 Mar 2021
DOIs	https://doi.org/10.4233/uuid:5437884e-0078-4b36-b2c7-c6edfea3b418
Publication status	Published - 2021

Keywords

Planning
Reinforcement learning
Sequential decision making
Markov decision process

Access to Document

10.4233/uuid:5437884e-0078-4b36-b2c7-c6edfea3b418

Moerland_Intersection_of_Planning_and_LearningFinal published version, 6.04 MBLicence: CC BY-NC
Moerland_PropositionsFinal published version, 36.7 KBLicence: CC BY-NC

Cite this

@phdthesis{5437884e00784b36b2c7c6edfea3b418,

title = "The Intersection of Planning and Learning",

abstract = "Intelligent sequential decision making is a key challenge in artificial intelligence. The problem, commonly formalized as a Markov Decision Process, is studied in two different research communities: planning and reinforcement learning. Departing from a fundamentally different assumption about the type of access to the environment, both research fields have developed their own solution approaches and conventions. The combination of both fields, known as model-based reinforcement learning, has recently shown state-of-the-art results, for example defeating human experts in classic board games like Chess and Go. Nevertheless, literature lacks an integrated view on 1) the similarities between planning and learning, and 2) the possible combinations of both. This dissertation aims to fill this gap. The first half of the book presents a conceptual answer to both questions. We first present a framework that disentangles the common algorithmic space of both fields, showing that they essentially face the same algorithmic design decisions. Moreover, we also present an overview of the different ways in which planning and learning can be combined in one algorithm. The second half of the dissertation provides experimental illustration of these ideas. We present several new combinations of planning and learning, such as a flexible method to learn stochastic dynamics models with neural networks, an extension of a successful planning-learning algorithm (AlphaZero) to deal with continuous action spaces, and a study of the empirical trade-off between planning and learning. Finally, we also illustrate the commonalities between both fields, by designing a new algorithm in one field based on inspiration from the other field. We conclude the thesis with an outlook for the planning-learning field as a whole. Altogether, the dissertation provides a broad theoretical and empirical view on the combination of planning and learning, which promises to be an important frontier in artificial intelligence research in the coming years.",

keywords = "Planning, Reinforcement learning, Sequential decision making, Markov decision process",

author = "T.M. Moerland",

year = "2021",

doi = "10.4233/uuid:5437884e-0078-4b36-b2c7-c6edfea3b418",

language = "English",

type = "Dissertation (TU Delft)",

school = "Delft University of Technology",

}

TY - THES

T1 - The Intersection of Planning and Learning

AU - Moerland, T.M.

PY - 2021

Y1 - 2021

N2 - Intelligent sequential decision making is a key challenge in artificial intelligence. The problem, commonly formalized as a Markov Decision Process, is studied in two different research communities: planning and reinforcement learning. Departing from a fundamentally different assumption about the type of access to the environment, both research fields have developed their own solution approaches and conventions. The combination of both fields, known as model-based reinforcement learning, has recently shown state-of-the-art results, for example defeating human experts in classic board games like Chess and Go. Nevertheless, literature lacks an integrated view on 1) the similarities between planning and learning, and 2) the possible combinations of both. This dissertation aims to fill this gap. The first half of the book presents a conceptual answer to both questions. We first present a framework that disentangles the common algorithmic space of both fields, showing that they essentially face the same algorithmic design decisions. Moreover, we also present an overview of the different ways in which planning and learning can be combined in one algorithm. The second half of the dissertation provides experimental illustration of these ideas. We present several new combinations of planning and learning, such as a flexible method to learn stochastic dynamics models with neural networks, an extension of a successful planning-learning algorithm (AlphaZero) to deal with continuous action spaces, and a study of the empirical trade-off between planning and learning. Finally, we also illustrate the commonalities between both fields, by designing a new algorithm in one field based on inspiration from the other field. We conclude the thesis with an outlook for the planning-learning field as a whole. Altogether, the dissertation provides a broad theoretical and empirical view on the combination of planning and learning, which promises to be an important frontier in artificial intelligence research in the coming years.

AB - Intelligent sequential decision making is a key challenge in artificial intelligence. The problem, commonly formalized as a Markov Decision Process, is studied in two different research communities: planning and reinforcement learning. Departing from a fundamentally different assumption about the type of access to the environment, both research fields have developed their own solution approaches and conventions. The combination of both fields, known as model-based reinforcement learning, has recently shown state-of-the-art results, for example defeating human experts in classic board games like Chess and Go. Nevertheless, literature lacks an integrated view on 1) the similarities between planning and learning, and 2) the possible combinations of both. This dissertation aims to fill this gap. The first half of the book presents a conceptual answer to both questions. We first present a framework that disentangles the common algorithmic space of both fields, showing that they essentially face the same algorithmic design decisions. Moreover, we also present an overview of the different ways in which planning and learning can be combined in one algorithm. The second half of the dissertation provides experimental illustration of these ideas. We present several new combinations of planning and learning, such as a flexible method to learn stochastic dynamics models with neural networks, an extension of a successful planning-learning algorithm (AlphaZero) to deal with continuous action spaces, and a study of the empirical trade-off between planning and learning. Finally, we also illustrate the commonalities between both fields, by designing a new algorithm in one field based on inspiration from the other field. We conclude the thesis with an outlook for the planning-learning field as a whole. Altogether, the dissertation provides a broad theoretical and empirical view on the combination of planning and learning, which promises to be an important frontier in artificial intelligence research in the coming years.

KW - Planning

KW - Reinforcement learning

KW - Sequential decision making

KW - Markov decision process

U2 - 10.4233/uuid:5437884e-0078-4b36-b2c7-c6edfea3b418

DO - 10.4233/uuid:5437884e-0078-4b36-b2c7-c6edfea3b418

M3 - Dissertation (TU Delft)

ER -

The Intersection of Planning and Learning

Abstract

Keywords

Access to Document

Fingerprint

Cite this