Back to Search Start Over

Reusable Reinforcement Learning via Shallow Trails.

Authors :
Yu, Yang
Chen, Shi-Yong
Da, Qing
Zhou, Zhi-Hua
Source :
IEEE Transactions on Neural Networks & Learning Systems; Jun2018, Vol. 29 Issue 6, p2204-2215, 12p
Publication Year :
2018

Abstract

Reinforcement learning has shown great success in helping learning agents accomplish tasks autonomously from environment interactions. Meanwhile in many real-world applications, an agent needs to accomplish not only a fixed task but also a range of tasks. For this goal, an agent can learn a metapolicy over a set of training tasks that are drawn from an underlying distribution. By maximizing the total reward summed over all the training tasks, the metapolicy can then be reused in accomplishing test tasks from the same distribution. However, in practice, we face two major obstacles to train and reuse metapolicies well. First, how to identify tasks that are unrelated or even opposite with each other, in order to avoid their mutual interference in the training. Second, how to characterize task features, according to which a metapolicy can be reused. In this paper, we propose the MetA-Policy LEarning (MAPLE) approach that overcomes the two difficulties by introducing the shallow trail. It probes a task by running a roughly trained policy. Using the rewards of the shallow trail, MAPLE automatically groups similar tasks. Moreover, when the task parameters are unknown, the rewards of the shallow trail also serve as task features. Empirical studies on several controlling tasks verify that MAPLE can train metapolicies well and receives high reward on test tasks. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
2162237X
Volume :
29
Issue :
6
Database :
Complementary Index
Journal :
IEEE Transactions on Neural Networks & Learning Systems
Publication Type :
Periodical
Accession number :
129655432
Full Text :
https://doi.org/10.1109/TNNLS.2018.2803729