Back to Search Start Over

Entropy regularized actor-critic based multi-agent deep reinforcement learning for stochastic games.

Authors :
Hao, Dong
Zhang, Dongcheng
Shi, Qi
Li, Kai
Source :
Information Sciences. Dec2022, Vol. 617, p17-40. 24p.
Publication Year :
2022

Abstract

Multi-agent reinforcement learning (MARL) is an abstract framework modeling a dynamic environment that involves multiple learning and decision-making agents, each of which tries to maximize her cumulative reward. In MARL, each agent discovers a strategy alongside others and adapts her policy in response to the behavioural changes of others. A fundamental difficulty faced by MARL is that every agent is dynamically learning and changing to improve her reward, making the whole system unstable and agents' policies difficult to converge. In this paper, we introduce the entropy regularizer into the Bellman equation and utilize Lagrange approach to optimize the entropy regularizer. We then propose a MARL algorithm based on the maximum entropy principle and the actor-critic method. This algorithm follows the policy gradient approach and uses a policy network and a value network. We call it Multi-Agent Deep Soft Policy Gradient (MADSPG). Then by using the Lagrange approach and dynamic minimax optimization, we propose the AUTO-MADSPG algorithm with an automatically adjusted entropy regularizer. These algorithms make multi-agent learning more stable while sufficient exploration is guaranteed. Finally, we also incorporate MADSPG with the recently proposed opponent modeling component into an integrated framework. This framework outperforms many state-of-the-art MARL algorithms in conventional cooperative and competitive game settings. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00200255
Volume :
617
Database :
Academic Search Index
Journal :
Information Sciences
Publication Type :
Periodical
Accession number :
161014296
Full Text :
https://doi.org/10.1016/j.ins.2022.10.022