Training for Implicit Norms in Deep Reinforcement Learning Agents through Adversarial Multi-Objective Reward Optimization

Markus Peschl*

*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

3 Citations (Scopus)
103 Downloads (Pure)

Abstract

We propose a deep reinforcement learning algorithm that employs an adversarial training strategy for adhering to implicit human norms alongside optimizing for a narrow goal objective. Previous methods which incorporate human values into reinforcement learning algorithms either scale poorly or assume hand-crafted state features. Our algorithm drops these assumptions and is able to automatically infer norms from human demonstrations, which allows for integrating it into existing agents in the form of multi-objective optimization. We benchmark our approach in a search-and-rescue grid world and show that, conditioned on respecting human norms, our agent maintains optimal performance with respect to the predefined goal.

Original languageEnglish
Title of host publicationAIES 2021 - Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society
PublisherAssociation for Computing Machinery (ACM)
Pages275-276
Number of pages2
ISBN (Electronic)9781450384735
DOIs
Publication statusPublished - 2021
Event4th AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, AIES 2021 - Virtual, Online, United States
Duration: 19 May 202121 May 2021

Publication series

NameAIES 2021 - Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

Conference

Conference4th AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, AIES 2021
Country/TerritoryUnited States
CityVirtual, Online
Period19/05/2121/05/21

Keywords

  • deep learning
  • inverse reinforcement learning
  • multi-objective optimization
  • value alignment

Fingerprint

Dive into the research topics of 'Training for Implicit Norms in Deep Reinforcement Learning Agents through Adversarial Multi-Objective Reward Optimization'. Together they form a unique fingerprint.

Cite this