Structure Learning for Safe Policy Improvement

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

3 Citations (Scopus)

Abstract

We investigate how Safe Policy Improvement (SPI) algorithms can exploit the structure of factored Markov decision processes when such structure is unknown a priori. To facilitate the application of reinforcement learning in the real world, SPI provides probabilistic guarantees that policy changes in a running process will improve the performance of this process. However, current SPI algorithms have requirements that might be impractical, such as: (i) availability of a large amount of historical data, or (ii) prior knowledge of the underlying structure. To overcome these limitations we enhance a Factored SPI (FSPI) algorithm with different structure learning methods. The resulting algorithms need fewer samples to improve the policy and require weaker prior knowledge assumptions. In well-factorized domains, the proposed algorithms improve performance significantly compared to a flat SPI algorithm, demonstrating a sample complexity closer to an FSPI algorithm that knows the structure. This indicates that the combination of FSPI and structure learning algorithms is a promising solution to real-world problems involving many variables.
Original languageEnglish
Title of host publicationProceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
EditorsS. Kraus
PublisherInternational Joint Conferences on Artifical Intelligence (IJCAI)
Pages3453-3459
Number of pages7
ISBN (Electronic)978-0-9992411-4-1
DOIs
Publication statusPublished - Jul 2019
Event28th International Joint Conference on Artificial Intelligence, IJCAI 2019 - Macao, China
Duration: 10 Aug 201916 Aug 2019

Conference

Conference28th International Joint Conference on Artificial Intelligence, IJCAI 2019
CountryChina
CityMacao
Period10/08/1916/08/19

Fingerprint

Dive into the research topics of 'Structure Learning for Safe Policy Improvement'. Together they form a unique fingerprint.
  • Safe Policy Improvement with an Estimated Baseline Policy

    Simão, T. D., Laroche, R. & Tachet des Combes, R., 2020, Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. Richland, SC, p. 1269–1277 9 p. (AAMAS '20).

    Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

    Open Access
    File
  • Safe Policy Improvement with Baseline Bootstrapping in Factored Environments

    Simão, T. D. & Spaan, M. T. J., 2019, 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019. American Association for Artificial Intelligence (AAAI), p. 4967-4974 8 p. (33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019).

    Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

    8 Citations (Scopus)

Cite this