1. Multi-UAV pursuit-evasion gaming based on PSO-M3DDPG schemes
- Author
-
Yaozhong Zhang, Meiyan Ding, Jiandong Zhang, Qiming Yang, Guoqing Shi, Meiqu Lu, and Frank Jiang
- Subjects
Pursuit-evasion game ,Particle swarm optimization algorithm ,Reinforcement learning ,M3DDPG (mini-max-multi-agent deep deterministic policy gradient) ,Electronic computers. Computer science ,QA75.5-76.95 ,Information technology ,T58.5-58.64 - Abstract
Abstract The sample data for reinforcement learning algorithms often exhibit sparsity and instability, making the training results susceptible to falling into local optima. Mini-Max-Multi-agent Deep Deterministic Policy Gradient (M3DDPG) algorithm is a multi-agent reinforcement learning algorithm, which introduces the minimax theorem into Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. It also has unstable convergence caused by sparse sample data and randomization. However, the Particle Swarm Optimisation (PSO) algorithm, unlike traditional reinforcement learning methods, involves the construction of independent populations of policy networks to generate sample data, followed by training the reinforcement learning algorithm. PSO optimizes and updates the policy population based on a fitness function, aiming to enhance the efficiency and convergence speed of the algorithm in learning from the sample data. In order to address the multi-agent pursuit-evasion problem, we propose the PSO-M3DDPG algorithm, which combines the PSO algorithm with the M3DDPG algorithm. Through experimental simulations, the improved algorithm demonstrates superior training results and faster convergence speeds, thus validating its effectiveness.
- Published
- 2024
- Full Text
- View/download PDF