Back to Search
Start Over
Fused Architecture for Dense and Sparse Matrix Processing in TensorFlow Lite.
- Source :
-
IEEE Micro . Nov/Dec2022, Vol. 42 Issue 6, p55-66. 12p. - Publication Year :
- 2022
-
Abstract
- In this article, we present a hardware architecture optimized for sparse and dense matrix processing in TensorFlow Lite and compatible with embedded-heterogeneous devices that integrate central processing unit and field-programmable gate array (FPGA) resources. The fused architecture for dense and sparse matrices design offers multiple configuration options that tradeoff parallelism and complexity, and uses a dataflow model to create four stages that read, compute, scale, and write results. All stages are designed to support TensorFlow Lite operations including asymmetric quantized activations, column-major matrix write, per-filter/per-axis bias values, and current scaling specifications. The configurable accelerator is integrated with the TensorFlow Lite inference engine running on the ARMv8 processor. We compare performance/power/energy with the state-of-the-art RUY software multiplication library showing up to 18× acceleration and 48× in dense and sparse modes, respectively. The sparse mode benefits from structural pruning to fully utilize the digital signal processing blocks present in the FPGA device. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 02721732
- Volume :
- 42
- Issue :
- 6
- Database :
- Academic Search Index
- Journal :
- IEEE Micro
- Publication Type :
- Academic Journal
- Accession number :
- 160651832
- Full Text :
- https://doi.org/10.1109/MM.2022.3196705