Back to Search Start Over

Adaptive Distributed Parallel Training Method for a Deep Learning Model Based on Dynamic Critical Paths of DAG.

Authors :
Zeng, Yan
Wang, Wei
Ding, Yong
Zhang, Jilin
Ren, Yongjian
Yi, Guangzheng
Source :
Mathematics (2227-7390). Dec2022, Vol. 10 Issue 24, p4788. 21p.
Publication Year :
2022

Abstract

AI provides a new method for massive simulated data calculations in molecular dynamics, materials, and other scientific computing fields. However, the complex structures and large-scale parameters of neural network models make them difficult to develop and train. The automatic parallel technology based on graph algorithms is one of the most promising methods to solve this problem, despite the low efficiency in the design, implementation, and execution of distributed parallel policies for large-scale neural network models. In this paper, we propose an adaptive distributed parallel training method based on the dynamic generation of critical DAG (directed acyclic graph) paths, called FD-DPS, to solve this efficiency problem. Firstly, the proposed model splits operators with the dimension of the tensor, which can expand the space available for model parallelism. Secondly, a dynamic critical path generation method is employed to determine node priority changes in the DAG of the neural network models. Finally, the model implements the optimal scheduling of critical paths based on the priority of the nodes, thereby improving the performance of parallel strategies. Our experiments show that FD-DPS can achieve 12.76% and 11.78% faster training on PnasNet_mobile and ResNet_200 models, respectively, compared with the MP-DPS and Fast methods. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
22277390
Volume :
10
Issue :
24
Database :
Academic Search Index
Journal :
Mathematics (2227-7390)
Publication Type :
Academic Journal
Accession number :
161003123
Full Text :
https://doi.org/10.3390/math10244788