Back to Search Start Over

An Efficient Multi-Agent Policy Self-Play Learning Method Aiming at Seize-Control Scenarios.

Authors :
Zhang, Huaqing
Ma, Hongbin
Zhang, Xiaofei
Wang, Li
Han, Minglei
Chen, Hui
Ding, Ao
Source :
Unmanned Systems. Sep2024, p1-18. 18p.
Publication Year :
2024

Abstract

Aiming at the problem of multi-agent cooperative confrontation in seize-control scenarios, we design an efficient multi-agent policy self-play (EMAP-SP) learning method. First, a multi-agent centralized policy model is constructed to command the agents to perform tasks cooperatively. Considering that the policy being trained and its historical policies usually have poor exploration capability under incomplete information in self-play trainings, the intrinsic reward mechanism based on random network distillation (RND) is introduced in the self-play learning method. In addition, we propose a multi-step on-policy deep reinforcement learning (DRL) algorithm assisted by off-policy policy evaluation (MSOAO) to learn the best response policy in the self-play. Compared with DRL algorithms commonly used in complex decision problems, MSOAO has more efficient policy evaluation capability, and efficient policy evaluation further improves the policy learning capability. The effectiveness of EMAP-SP is fully verified in MiaoSuan wargame simulation system, and the evaluation results show that EMAP-SP can learn the cooperative policy of effectively defeating the Blue side’s knowledge-based policy under incomplete information. Moreover, the evaluations results in DRL benchmark environments also show that the best response policy learning algorithm MSOAO can promote the agent to learn approximately optimal policies. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
23013850
Database :
Academic Search Index
Journal :
Unmanned Systems
Publication Type :
Academic Journal
Accession number :
179978104
Full Text :
https://doi.org/10.1142/s230138502550061x