Back to Search Start Over

Asynchronous SGD with stale gradient dynamic adjustment for deep learning training.

Authors :
Tan, Tao
Xie, Hong
Xia, Yunni
Shi, Xiaoyu
Shang, Mingsheng
Source :
Information Sciences. Oct2024, Vol. 681, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

Asynchronous stochastic gradient descent (ASGD) is a computationally efficient algorithm, which speeds up deep learning training and plays an important role in distributed deep learning. However, ASGD suffers from the stale gradient problem, i.e., the gradient of worker may mismatch the weight of parameter server. This problem seriously affects the model performance and even causes the divergence. To address this issue, this paper designs a dynamic adjustment scheme via the momentum algorithm, which uses both stale penalty and stale compensation , i.e., stale penalty is to reduce the trust in stale gradient, stale compensation is to compensate the hurt of stale gradient. Based on this dynamic adjustment scheme, this paper proposes a dynamic asynchronous stochastic gradient descent algorithm (DASGD), which dynamically adjusts the compensation factor and the penalty factor via stale size. Moreover, we prove that DASGD is convergent under some mild assumptions. Finally, we build a real distributed training cluster to evaluate our DASGD on Cifar10 and ImageNet datasets. Compared with four SOTA baselines, experiment results confirm the superior performance of DASGD. More specifically, our DASGD has nearly the same test accuracy as SGD on Cifar10 and ImageNet , and only uses around 27.6% and 40.8% training time that of SGD, respectively. [ABSTRACT FROM AUTHOR]

Subjects

Subjects :
*DEEP learning
*ALGORITHMS

Details

Language :
English
ISSN :
00200255
Volume :
681
Database :
Academic Search Index
Journal :
Information Sciences
Publication Type :
Periodical
Accession number :
178885127
Full Text :
https://doi.org/10.1016/j.ins.2024.121220