Back to Search
Start Over
Distributed Policy Evaluation Under Multiple Behavior Strategies.
- Source :
-
IEEE Transactions on Automatic Control . May2015, Vol. 60 Issue 5, p1260-1274. 15p. - Publication Year :
- 2015
-
Abstract
- We apply diffusion strategies to develop a fully-distributed cooperative reinforcement learning algorithm in which agents in a network communicate only with their immediate neighbors to improve predictions about their environment. The algorithm can also be applied to off-policy learning, meaning that the agents can predict the response to a behavior different from the actual policies they are following. The proposed distributed strategy is efficient, with linear complexity in both computation time and memory footprint. We provide a mean-square-error performance analysis and establish convergence under constant step-size updates, which endow the network with continuous learning capabilities. The results show a clear gain from cooperation: when the individual agents can estimate the solution, cooperation increases stability and reduces bias and variance of the prediction error; but, more importantly, the network is able to approach the optimal solution even when none of the individual agents can (e.g., when the individual behavior policies restrict each agent to sample a small portion of the state space). [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 00189286
- Volume :
- 60
- Issue :
- 5
- Database :
- Academic Search Index
- Journal :
- IEEE Transactions on Automatic Control
- Publication Type :
- Periodical
- Accession number :
- 102229142
- Full Text :
- https://doi.org/10.1109/TAC.2014.2368731