Towards Learning from Implicit Human Reward: (Extended Abstract)

Guangliang Li; Hamdi Dibeklioglu; S Whiteson; Hayley Hung

Abstract

The TAMER framework provides a way for agents to learn to solve tasks using human-generated rewards. Previous research showed that humans give copious feedback early in training but very sparsely thereafter and that an agent's competitive feedback --- informing the trainer about its performance relative to other trainers --- can greatly affect the trainer's engagement and the agent's learning. In this paper, we present the first large-scale study of TAMER, involving 561 subjects, which investigates the effect of the agent's competitive feedback in a new setting as well as the potential for learning from trainers' facial expressions. Our results show for the first time that a TAMER agent can successfully learn to play Infinite Mario, a challenging reinforcement-learning benchmark problem. In addition, our study supports prior results demonstrating the importance of bi-directional feedback and competitive elements in the training interface. Finally, our results shed light on the potential for using trainers' facial expressions as reward signals, as well as the role of age and gender in trainer behavior and agent performance.

Original language	English
Pages	1353-1354
Number of pages	2
Publication status	Published - 2016
Event	AAMAS 2016 : 15th International Conference on Autonomous Agents and Multiagent Systems - Singapore, Singapore Duration: 9 May 2016 → 13 May 2016 Conference number: 15 https://sis.smu.edu.sg/aamas2016

Conference

Conference	AAMAS 2016
Abbreviated title	AAMAS
Country/Territory	Singapore
City	Singapore
Period	9/05/16 → 13/05/16
Internet address	https://sis.smu.edu.sg/aamas2016

Keywords

Reinforcement learning
human agent interaction

Cite this

@conference{cc113785e68945199d0cdc3dfaf49acc,

title = "Towards Learning from Implicit Human Reward: (Extended Abstract)",

abstract = "The TAMER framework provides a way for agents to learn to solve tasks using human-generated rewards. Previous research showed that humans give copious feedback early in training but very sparsely thereafter and that an agent's competitive feedback --- informing the trainer about its performance relative to other trainers --- can greatly affect the trainer's engagement and the agent's learning. In this paper, we present the first large-scale study of TAMER, involving 561 subjects, which investigates the effect of the agent's competitive feedback in a new setting as well as the potential for learning from trainers' facial expressions. Our results show for the first time that a TAMER agent can successfully learn to play Infinite Mario, a challenging reinforcement-learning benchmark problem. In addition, our study supports prior results demonstrating the importance of bi-directional feedback and competitive elements in the training interface. Finally, our results shed light on the potential for using trainers' facial expressions as reward signals, as well as the role of age and gender in trainer behavior and agent performance.",

keywords = "Reinforcement learning, human agent interaction",

author = "Guangliang Li and Hamdi Dibeklioglu and S Whiteson and Hayley Hung",

year = "2016",

language = "English",

pages = "1353--1354",

note = "AAMAS 2016 : 15th International Conference on Autonomous Agents and Multiagent Systems, AAMAS ; Conference date: 09-05-2016 Through 13-05-2016",

url = "https://sis.smu.edu.sg/aamas2016",

}

TY - CONF

T1 - Towards Learning from Implicit Human Reward

T2 - AAMAS 2016

AU - Li, Guangliang

AU - Dibeklioglu, Hamdi

AU - Whiteson, S

AU - Hung, Hayley

N1 - Conference code: 15

PY - 2016

Y1 - 2016

N2 - The TAMER framework provides a way for agents to learn to solve tasks using human-generated rewards. Previous research showed that humans give copious feedback early in training but very sparsely thereafter and that an agent's competitive feedback --- informing the trainer about its performance relative to other trainers --- can greatly affect the trainer's engagement and the agent's learning. In this paper, we present the first large-scale study of TAMER, involving 561 subjects, which investigates the effect of the agent's competitive feedback in a new setting as well as the potential for learning from trainers' facial expressions. Our results show for the first time that a TAMER agent can successfully learn to play Infinite Mario, a challenging reinforcement-learning benchmark problem. In addition, our study supports prior results demonstrating the importance of bi-directional feedback and competitive elements in the training interface. Finally, our results shed light on the potential for using trainers' facial expressions as reward signals, as well as the role of age and gender in trainer behavior and agent performance.

AB - The TAMER framework provides a way for agents to learn to solve tasks using human-generated rewards. Previous research showed that humans give copious feedback early in training but very sparsely thereafter and that an agent's competitive feedback --- informing the trainer about its performance relative to other trainers --- can greatly affect the trainer's engagement and the agent's learning. In this paper, we present the first large-scale study of TAMER, involving 561 subjects, which investigates the effect of the agent's competitive feedback in a new setting as well as the potential for learning from trainers' facial expressions. Our results show for the first time that a TAMER agent can successfully learn to play Infinite Mario, a challenging reinforcement-learning benchmark problem. In addition, our study supports prior results demonstrating the importance of bi-directional feedback and competitive elements in the training interface. Finally, our results shed light on the potential for using trainers' facial expressions as reward signals, as well as the role of age and gender in trainer behavior and agent performance.

KW - Reinforcement learning

KW - human agent interaction

UR - http://dl.acm.org/citation.cfm?id=2937156

M3 - Abstract

SP - 1353

EP - 1354

Y2 - 9 May 2016 through 13 May 2016

ER -

Towards Learning from Implicit Human Reward: (Extended Abstract)

Abstract

Conference

Keywords

Other files and links

Fingerprint

Cite this