TY - JOUR
T1 - Knowledge- and ambiguity-aware robot learning from corrective and evaluative feedback
AU - Celemin, Carlos
AU - Kober, Jens
PY - 2023
Y1 - 2023
N2 - In order to deploy robots that could be adapted by non-expert users, interactive imitation learning (IIL) methods must be flexible regarding the interaction preferences of the teacher and avoid assumptions of perfect teachers (oracles), while considering they make mistakes influenced by diverse human factors. In this work, we propose an IIL method that improves the human–robot interaction for non-expert and imperfect teachers in two directions. First, uncertainty estimation is included to endow the agents with a lack of knowledge awareness (epistemic uncertainty) and demonstration ambiguity awareness (aleatoric uncertainty), such that the robot can request human input when it is deemed more necessary. Second, the proposed method enables the teachers to train with the flexibility of using corrective demonstrations, evaluative reinforcements, and implicit positive feedback. The experimental results show an improvement in learning convergence with respect to other learning methods when the agent learns from highly ambiguous teachers. Additionally, in a user study, it was found that the components of the proposed method improve the teaching experience and the data efficiency of the learning process.
AB - In order to deploy robots that could be adapted by non-expert users, interactive imitation learning (IIL) methods must be flexible regarding the interaction preferences of the teacher and avoid assumptions of perfect teachers (oracles), while considering they make mistakes influenced by diverse human factors. In this work, we propose an IIL method that improves the human–robot interaction for non-expert and imperfect teachers in two directions. First, uncertainty estimation is included to endow the agents with a lack of knowledge awareness (epistemic uncertainty) and demonstration ambiguity awareness (aleatoric uncertainty), such that the robot can request human input when it is deemed more necessary. Second, the proposed method enables the teachers to train with the flexibility of using corrective demonstrations, evaluative reinforcements, and implicit positive feedback. The experimental results show an improvement in learning convergence with respect to other learning methods when the agent learns from highly ambiguous teachers. Additionally, in a user study, it was found that the components of the proposed method improve the teaching experience and the data efficiency of the learning process.
KW - Active learning
KW - Corrective demonstrations
KW - Human reinforcement
KW - Interactive imitation learning
KW - Uncertainty
UR - http://www.scopus.com/inward/record.url?scp=85146280338&partnerID=8YFLogxK
U2 - 10.1007/s00521-022-08118-z
DO - 10.1007/s00521-022-08118-z
M3 - Article
AN - SCOPUS:85146280338
SN - 0941-0643
VL - 35
SP - 16821
EP - 16839
JO - Neural Computing and Applications
JF - Neural Computing and Applications
IS - 23
ER -