Back to Search Start Over

A3C-GS: Adaptive Moment Gradient Sharing With Locks for Asynchronous Actor–Critic Agents.

Authors :
Labao, Alfonso B.
Martija, Mygel Andrei M.
Naval, Prospero C.
Source :
IEEE Transactions on Neural Networks & Learning Systems. Mar2021, Vol. 32 Issue 3, p1162-1176. 15p.
Publication Year :
2021

Abstract

We propose an asynchronous gradient sharing mechanism for the parallel actor–critic algorithms with improved exploration characteristics. The proposed algorithm (A3C-GS) has the property of automatically diversifying worker policies in the short term for exploration, thereby reducing the need for entropy loss terms. Despite policy diversification, the algorithm converges to the optimal policy in the long term. We show in our analysis that the gradient sharing operation is a composition of two contractions. The first contraction performs gradient computation, while the second contraction is a gradient sharing operation coordinated by locks. From these two contractions, certain short- and long-term properties result. For the short term, gradient sharing induces temporary heterogeneity in policies for performing needed exploration. In the long term, under a suitably small learning rate and gradient clipping, convergence to the optimal policy is theoretically guaranteed. We verify our results with several high-dimensional experiments and compare A3C-GS against other on-policy policy-gradient algorithms. Our proposed algorithm achieved the highest weighted score. Despite lower entropy weights, it performed well in high-dimensional environments that require exploration due to sparse rewards and those that need navigation in 3-D environments for long survival tasks. It consistently performed better than the base asynchronous advantage actor–critic (A3C) algorithm. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
2162237X
Volume :
32
Issue :
3
Database :
Academic Search Index
Journal :
IEEE Transactions on Neural Networks & Learning Systems
Publication Type :
Periodical
Accession number :
149122060
Full Text :
https://doi.org/10.1109/TNNLS.2020.2980743