1. The Asynchronous Training Algorithm Based on Sampling and Mean Fusion for Distributed RNN
- Author
-
Dejiao Niu, Tianquan Liu, Tao Cai, and Shijie Zhou
- Subjects
Asynchronous training ,distributed recurrent neural network ,mean fusion ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Training of large scale deep neural networks with distributed implementations is an effective way to improve the efficiency. However, high network communication cost for synchronizing gradients and parameters is a major bottleneck in distributed training. In this work, we propose an asynchronous training algorithm based on sampling and mean fusion for distributed recurrent neural network (RNN). In distributed RNN, multiple distributed neuron nodes and an interaction node work together to implement the training. The synchronization overhead is reduced by a unique asynchronous sampling strategy amongst the distributed neuron nodes. Then, in order to make up for the accuracy loss caused by the asynchronous parameter update, a mean fusion algorithm is proposed, where the interaction node averages all local parameters from the distributed neurons. We mathematically prove the convergence of the proposed algorithm. Experimental verification is performed on two language modeling benchmark datasets. The results demonstrate significant speed gains for distributed RNN, while the accuracy loss is less than 1% on average.
- Published
- 2020
- Full Text
- View/download PDF