TY - JOUR
T1 - Online learning for optimistic planning
AU - Buşoniu, Lucian
AU - Daniels, Alexander
AU - Babuska, R.
PY - 2016
Y1 - 2016
N2 - Markov decision processes are a powerful framework for nonlinear, possibly stochastic optimal control. We consider two existing optimistic planning algorithms to solve them, which originate in artificial intelligence. These algorithms have provable near-optimal performance when the actions and possible stochastic next-states are discrete, but they wastefully discard the planning data after each step. We therefore introduce a method to learn online, from this data, the upper bounds that are used to guide the planning process. Five different approximators for the upper bounds are proposed, one of which is specifically adapted to planning, and the other four coming from the standard toolbox of function approximation. Our analysis characterizes the influence of the approximation error on the performance, and reveals that for small errors, learning-based planning performs better. In detailed experimental studies, learning leads to improved performance with all five representations, and a local variant of support vector machines provides a good compromise between performance and computation.
AB - Markov decision processes are a powerful framework for nonlinear, possibly stochastic optimal control. We consider two existing optimistic planning algorithms to solve them, which originate in artificial intelligence. These algorithms have provable near-optimal performance when the actions and possible stochastic next-states are discrete, but they wastefully discard the planning data after each step. We therefore introduce a method to learn online, from this data, the upper bounds that are used to guide the planning process. Five different approximators for the upper bounds are proposed, one of which is specifically adapted to planning, and the other four coming from the standard toolbox of function approximation. Our analysis characterizes the influence of the approximation error on the performance, and reveals that for small errors, learning-based planning performs better. In detailed experimental studies, learning leads to improved performance with all five representations, and a local variant of support vector machines provides a good compromise between performance and computation.
KW - Machine learning
KW - Markov decision processes
KW - Near-optimality analysis
KW - Optimal control
KW - Optimistic planning
UR - http://www.scopus.com/inward/record.url?scp=84976598540&partnerID=8YFLogxK
U2 - 10.1016/j.engappai.2016.05.003
DO - 10.1016/j.engappai.2016.05.003
M3 - Article
AN - SCOPUS:84976598540
SN - 0952-1976
VL - 55
SP - 70
EP - 82
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
ER -