TY - JOUR
T1 - Facial feedback for reinforcement learning
T2 - a case study and offline analysis using the TAMER framework
AU - Li, Guangliang
AU - Dibeklioğlu, Hamdi
AU - Whiteson, Shimon
AU - Hung, Hayley
PY - 2020
Y1 - 2020
N2 - Interactive reinforcement learning provides a way for agents to learn to solve tasks from evaluative feedback provided by a human user. Previous research showed that humans give copious feedback early in training but very sparsely thereafter. In this article, we investigate the potential of agent learning from trainers’ facial expressions via interpreting them as evaluative feedback. To do so, we implemented TAMER which is a popular interactive reinforcement learning method in a reinforcement-learning benchmark problem—Infinite Mario, and conducted the first large-scale study of TAMER involving 561 participants. With designed CNN–RNN model, our analysis shows that telling trainers to use facial expressions and competition can improve the accuracies for estimating positive and negative feedback using facial expressions. In addition, our results with a simulation experiment show that learning solely from predicted feedback based on facial expressions is possible and using strong/effective prediction models or a regression method, facial responses would significantly improve the performance of agents. Furthermore, our experiment supports previous studies demonstrating the importance of bi-directional feedback and competitive elements in the training interface.
AB - Interactive reinforcement learning provides a way for agents to learn to solve tasks from evaluative feedback provided by a human user. Previous research showed that humans give copious feedback early in training but very sparsely thereafter. In this article, we investigate the potential of agent learning from trainers’ facial expressions via interpreting them as evaluative feedback. To do so, we implemented TAMER which is a popular interactive reinforcement learning method in a reinforcement-learning benchmark problem—Infinite Mario, and conducted the first large-scale study of TAMER involving 561 participants. With designed CNN–RNN model, our analysis shows that telling trainers to use facial expressions and competition can improve the accuracies for estimating positive and negative feedback using facial expressions. In addition, our results with a simulation experiment show that learning solely from predicted feedback based on facial expressions is possible and using strong/effective prediction models or a regression method, facial responses would significantly improve the performance of agents. Furthermore, our experiment supports previous studies demonstrating the importance of bi-directional feedback and competitive elements in the training interface.
KW - Facial expressions
KW - Human agent interaction
KW - Interactive reinforcement learning
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85079570004&partnerID=8YFLogxK
U2 - 10.1007/s10458-020-09447-w
DO - 10.1007/s10458-020-09447-w
M3 - Article
AN - SCOPUS:85079570004
SN - 1387-2532
VL - 34
JO - Autonomous Agents and Multi-Agent Systems
JF - Autonomous Agents and Multi-Agent Systems
IS - 1
M1 - 22
ER -