Synthesising Reinforcement Learning Policies Through Set-Valued Inductive Rule Learning

Youri Coppens*, Denis Steckelmacher, Catholijn M. Jonker, Ann Nowé

*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

9 Downloads (Pure)

Abstract

Today’s advanced Reinforcement Learning algorithms produce black-box policies, that are often difficult to interpret and trust for a person. We introduce a policy distilling algorithm, building on the CN2 rule mining algorithm, that distills the policy into a rule-based decision system. At the core of our approach is the fact that an RL process does not just learn a policy, a mapping from states to actions, but also produces extra meta-information, such as action values indicating the quality of alternative actions. This meta-information can indicate whether more than one action is near-optimal for a certain state. We extend CN2 to make it able to leverage knowledge about equally-good actions to distill the policy into fewer rules, increasing its interpretability by a person. Then, to ensure that the rules explain a valid, non-degenerate policy, we introduce a refinement algorithm that fine-tunes the rules to obtain good performance when executed in the environment. We demonstrate the applicability of our algorithm on the Mario AI benchmark, a complex task that requires modern reinforcement learning algorithms including neural networks. The explanations we produce capture the learned policy in only a few rules, that allow a person to understand what the black-box agent learned. Source code: https://gitlab.ai.vub.ac.be/yocoppen/svcn2.

Original languageEnglish
Title of host publicationTrustworthy AI – Integrating Learning, Optimization and Reasoning - First International Workshop, TAILOR 2020, Revised Selected Papers
EditorsFredrik Heintz, Michela Milano, Barry O’Sullivan
PublisherSpringer
Pages163-179
Number of pages17
ISBN (Print)9783030739584
DOIs
Publication statusPublished - 2021
Event1st International Workshop on Trustworthy AI – Integrating Learning, Optimization and Reasoning, TAILOR 2020 held as a part of European Conference on Artificial Intelligence, ECAI 2020 - Virtual, Online
Duration: 4 Sep 20205 Sep 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12641 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference1st International Workshop on Trustworthy AI – Integrating Learning, Optimization and Reasoning, TAILOR 2020 held as a part of European Conference on Artificial Intelligence, ECAI 2020
CityVirtual, Online
Period4/09/205/09/20

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

  • Explainable AI
  • Inductive rule learning
  • Policy distillation
  • Reinforcement Learning

Fingerprint

Dive into the research topics of 'Synthesising Reinforcement Learning Policies Through Set-Valued Inductive Rule Learning'. Together they form a unique fingerprint.

Cite this