Online learning for optimistic planning

Lucian Buşoniu*, Alexander Daniels, R. Babuska

*Corresponding author for this work

    Research output: Contribution to journalArticleScientificpeer-review

    2 Citations (Scopus)


    Markov decision processes are a powerful framework for nonlinear, possibly stochastic optimal control. We consider two existing optimistic planning algorithms to solve them, which originate in artificial intelligence. These algorithms have provable near-optimal performance when the actions and possible stochastic next-states are discrete, but they wastefully discard the planning data after each step. We therefore introduce a method to learn online, from this data, the upper bounds that are used to guide the planning process. Five different approximators for the upper bounds are proposed, one of which is specifically adapted to planning, and the other four coming from the standard toolbox of function approximation. Our analysis characterizes the influence of the approximation error on the performance, and reveals that for small errors, learning-based planning performs better. In detailed experimental studies, learning leads to improved performance with all five representations, and a local variant of support vector machines provides a good compromise between performance and computation.

    Original languageEnglish
    Pages (from-to)70-82
    JournalEngineering Applications of Artificial Intelligence
    Publication statusPublished - 2016


    • Machine learning
    • Markov decision processes
    • Near-optimality analysis
    • Optimal control
    • Optimistic planning


    Dive into the research topics of 'Online learning for optimistic planning'. Together they form a unique fingerprint.

    Cite this