Balancing Computation Loads and Optimizing Input Vector Loading in LSTM Accelerators.

Authors :: Park, Junki
Yi, Wooseok
Ahn, Daehyun
Kung, Jaeha
Kim, Jae-Joon
Source :: IEEE Transactions on Computer-Aided Design of Integrated Circuits & Systems. Sep2020, Vol. 39 Issue 9, p1889-1901. 13p.
Publication Year :: 2020
Abstract: The long short-term memory (LSTM) is a widely used neural network model for dealing with time-varying data. To reduce the memory requirement, pruning is often applied to the weight matrix of the LSTM, which makes the matrix sparse. In this paper, we present a new sparse matrix format, named rearranged compressed sparse column (RCSC), to maximize the inference speed of the LSTM hardware accelerator. The RCSC format speeds up the inference by: 1) evenly distributing the computation loads to processing elements (PEs) and 2) reducing the input vector load miss within the local buffer. We also propose a hardware architecture adopting hierarchical input buffer to further reduce the pipeline stalls which cannot be handled by the RCSC format alone. The simulation results for various datasets show that combined use of the RSCS format and the proposed hardware requires $2\times $ smaller inference runtime on average compared to the previous work. [ABSTRACT FROM AUTHOR]

Subjects :: *ARTIFICIAL neural networks
*SPARSE matrices
*RECURRENT neural networks
*RUN time systems (Computer science)

Language :: English
ISSN :: 02780070
Volume :: 39
Issue :: 9
Database :: Academic Search Index
Journal :: IEEE Transactions on Computer-Aided Design of Integrated Circuits & Systems
Publication Type :: Academic Journal
Accession number :: 145287418
Full Text :: https://doi.org/10.1109/TCAD.2019.2926482

Full Text Access

Tools