Author: "Xingyu Na" / Journal: 2015 ieee workshop on automatic speech recognition and understanding (asru) - Searchworks@Jio Institute Digital Library Search Results

Searchworks

Author: Xin Li, Yonghong Yan, Xingyu Na, Jielin Pan, and Wang Zhichao
Subjects: Ethernet, Data set, Acceleration, Stochastic gradient descent, Asynchronous communication, Computer science, Computer cluster, Node (networking), Speech recognition, InfiniBand
Abstract: Deep neural networks have shown significant improvements on acoustic modelling, pushing state-of-the-art performance in large vocabulary continuous speech recognition (LVCSR) tasks. However, training DNNs is very time-consuming on scaled data. In this paper, a data-parallel method, namely two-stage ASGD, is proposed. Two-stage ASGD is based on asynchronous stochastic gradient descent (ASGD) paradigm and is tuned for GPU-equipped computing cluster connected by 10Gbit/s Ethernet other than Infiniband. Several techniques, such as hierarchical learning rate control, double-buffering and order-locking are applied to optimise the communication-to-transmission ratio. The proposed framework is evaluated by training a DNN with 29.5M parameters using a 500-hours Chinese continuous telephone speech data set. By using 4 computer nodes and 8 GPU devices (2 devices used in each node), a 5.9 times acceleration is obtained over a single GPU with acceptable loss of accuracy (0.5% in average). A comparative experiment is done to compare the proposed two-stage ASGD with the parallel DNN training systems reported in prior work.
Published: 2015

Searchworks