Back to Search Start Over

Fused Architecture for Dense and Sparse Matrix Processing in TensorFlow Lite.

Authors :
Nunez-Yanez, Jose
Source :
IEEE Micro. Nov/Dec2022, Vol. 42 Issue 6, p55-66. 12p.
Publication Year :
2022

Abstract

In this article, we present a hardware architecture optimized for sparse and dense matrix processing in TensorFlow Lite and compatible with embedded-heterogeneous devices that integrate central processing unit and field-programmable gate array (FPGA) resources. The fused architecture for dense and sparse matrices design offers multiple configuration options that tradeoff parallelism and complexity, and uses a dataflow model to create four stages that read, compute, scale, and write results. All stages are designed to support TensorFlow Lite operations including asymmetric quantized activations, column-major matrix write, per-filter/per-axis bias values, and current scaling specifications. The configurable accelerator is integrated with the TensorFlow Lite inference engine running on the ARMv8 processor. We compare performance/power/energy with the state-of-the-art RUY software multiplication library showing up to 18× acceleration and 48× in dense and sparse modes, respectively. The sparse mode benefits from structural pruning to fully utilize the digital signal processing blocks present in the FPGA device. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
02721732
Volume :
42
Issue :
6
Database :
Academic Search Index
Journal :
IEEE Micro
Publication Type :
Academic Journal
Accession number :
160651832
Full Text :
https://doi.org/10.1109/MM.2022.3196705