Back to Search Start Over

Combination of learning from non-optimal demonstrations and feedbacks using inverse reinforcement learning and Bayesian policy improvement.

Authors :
Ezzeddine, Ali
Mourad, Nafee
Araabi, Babak Nadjar
Ahmadabadi, Majid Nili
Source :
Expert Systems with Applications. Dec2018, Vol. 112, p331-341. 11p.
Publication Year :
2018

Abstract

Inverse reinforcement learning ( IRL ) is a powerful tool for teaching by demonstrations, provided that sufficiently diverse and optimal demonstrations are given, and learner agent correctly perceives those demonstrations. These conditions are hard to meet in practice; as a trainer cannot cover all possibilities by demonstrations, he may partially fail to follow the optimal behavior. Also, trainer and learner have different perceptions of the environment including trainer's actions. A practical way to overcome these problems is using a combination of trainer's demonstrations and feedbacks. We propose an interactive learning approach to overcome the challenge of non-optimal demonstrations by integrating human evaluative feedbacks with the IRL process, given sufficiently diverse demonstrations and the domain transition model. To this end, we develop a probabilistic model of human feedbacks and iteratively improve the agent policy using Bayes rule. We then integrate this information in an extended IRL algorithm to enhance the learned reward function. We examine the developed approach in one experimental and two simulated tasks; i.e., a grid world navigation, a highway car driving system and a navigation task by the e-puck robot. Obtained results show significant improved efficiency of the proposed approach in face of having different levels of non-optimality in demonstrations and the number of evaluative feedbacks. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09574174
Volume :
112
Database :
Academic Search Index
Journal :
Expert Systems with Applications
Publication Type :
Academic Journal
Accession number :
131185237
Full Text :
https://doi.org/10.1016/j.eswa.2018.06.035