Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

Abstract

Reinforcement Learning (RL) deals with problems that can be modeled as a Markov Decision Process (MDP) where the transition function is unknown. In situations where an arbitrary policy π is already in execution and the experiences with the environment were recorded in a batch D, an RL algorithm can use D to compute a new policy π 0. However, the policy computed by traditional RL algorithms might have worse performance compared to π. Our goal is to develop safe RL algorithms, where the agent has a high confidence that the performance of π 0 is better than the performance of π given D. To develop sample-efficient and safe RL algorithms we combine ideas from exploration strategies in RL with a safe policy improvement method.

Original languageEnglish
Title of host publicationProceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019
EditorsSarit Kraus
PublisherInternational Joint Conferences on Artifical Intelligence (IJCAI)
Pages6460-6461
Number of pages2
ISBN (Electronic)978-0-9992411-4-1
DOIs
Publication statusPublished - 2019
EventIJCAI 2019: 28th International Joint Conference on Artificial Intelligence - Macao, China
Duration: 10 Aug 201916 Aug 2019

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
Volume2019-August
ISSN (Print)1045-0823

Conference

ConferenceIJCAI 2019
CountryChina
CityMacao
Period10/08/1916/08/19

Fingerprint

Dive into the research topics of 'Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments'. Together they form a unique fingerprint.
  • Safe Policy Improvement with an Estimated Baseline Policy

    Simão, T. D., Laroche, R. & Tachet des Combes, R., 2020, Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. Richland, SC, p. 1269–1277 9 p. (AAMAS '20).

    Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

    Open Access
    File
  • Safe Policy Improvement with Baseline Bootstrapping in Factored Environments

    Simão, T. D. & Spaan, M. T. J., 2019, 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019. American Association for Artificial Intelligence (AAAI), p. 4967-4974 8 p. (33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019).

    Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

    8 Citations (Scopus)

Cite this