Start Over

A Hierarchical Deep Deterministic Policy Gradients for Swarm Navigation

Authors :: Hung The Nguyen
Tung Nguyen
Do-Van Nguyen
Thanh-Ha Le
Source :: KSE
Publication Year :: 2019
Publisher :: IEEE, 2019.
Abstract: The problem of swarm navigation recently becomes a significant topic because of its suitability for various applications like search and rescue with autonomous systems. However, when it comes to a swarm of learning agents, one has to face the challenges from huge state spaces and the lack of scalability when the size of the swarm increases. Reinforcement learning (RL) approaches, which allow agents to interact not only with each other but also with their operational environment to obtain optimal policies, are considered as promising techniques for swarm navigation problems. Different RL algorithms have been used to solve these problems but most of them are limited to discrete state spaces and/or do not scale well with an increase of number of learning agents in the swarm. In this paper, we propose a Swarm Hierarchical Deep Deterministic Policy Gradients (SH-DDPGs) framework to address both drawbacks above in the context of leader-follower swarm navigation. By decomposing the navigation task of the swarm into two primitive sub-tasks: leader-following and collision avoidance, we can guarantee the convergence of the training processes of these sub-tasks in a continuous environment before combining output actions produced from those trained models to complete the entire task. Moreover, our method represents scalability as it is independent to the size of the swarm. Firstly, when training a follower, we only use information of its neighbors within its local view and the leader. Secondly, the trained model of one follower can be reapplied for the remaining followers. Training results show that the proposed SH-DDPGs algorithm is able to converge quickly and allow followers agent to learn an optimal policy for the whole group to navigate through the environment without colliding with each other and flexibly optimize their formation so that the distances among agents are minimized.

Subjects :: 0209 industrial biotechnology
Computer science
business.industry
Swarm behaviour
Context (language use)
02 engineering and technology
Task (project management)
020901 industrial engineering & automation
Convergence (routing)
Scalability
0202 electrical engineering, electronic engineering, information engineering
Task analysis
Robot
Reinforcement learning
020201 artificial intelligence & image processing
Artificial intelligence
business

Details

Database :: OpenAIRE
Journal :: 2019 11th International Conference on Knowledge and Systems Engineering (KSE)
Accession number :: edsair.doi...........942784ee9a250248d3c112a0039b0647
Full Text :: https://doi.org/10.1109/kse.2019.8919269