1. Two-stage ASGD framework for parallel training of DNN acoustic models using Ethernet
- Author
-
Xin Li, Yonghong Yan, Xingyu Na, Jielin Pan, and Wang Zhichao
- Subjects
Ethernet ,Data set ,Acceleration ,Stochastic gradient descent ,Asynchronous communication ,Computer science ,Computer cluster ,Node (networking) ,Speech recognition ,InfiniBand - Abstract
Deep neural networks have shown significant improvements on acoustic modelling, pushing state-of-the-art performance in large vocabulary continuous speech recognition (LVCSR) tasks. However, training DNNs is very time-consuming on scaled data. In this paper, a data-parallel method, namely two-stage ASGD, is proposed. Two-stage ASGD is based on asynchronous stochastic gradient descent (ASGD) paradigm and is tuned for GPU-equipped computing cluster connected by 10Gbit/s Ethernet other than Infiniband. Several techniques, such as hierarchical learning rate control, double-buffering and order-locking are applied to optimise the communication-to-transmission ratio. The proposed framework is evaluated by training a DNN with 29.5M parameters using a 500-hours Chinese continuous telephone speech data set. By using 4 computer nodes and 8 GPU devices (2 devices used in each node), a 5.9 times acceleration is obtained over a single GPU with acceptable loss of accuracy (0.5% in average). A comparative experiment is done to compare the proposed two-stage ASGD with the parallel DNN training systems reported in prior work.
- Published
- 2015