Fused Architecture for Dense and Sparse Matrix Processing in TensorFlow Lite.

Authors :: Nunez-Yanez, Jose
Source :: IEEE Micro. Nov/Dec2022, Vol. 42 Issue 6, p55-66. 12p.
Publication Year :: 2022
Abstract: In this article, we present a hardware architecture optimized for sparse and dense matrix processing in TensorFlow Lite and compatible with embedded-heterogeneous devices that integrate central processing unit and field-programmable gate array (FPGA) resources. The fused architecture for dense and sparse matrices design offers multiple configuration options that tradeoff parallelism and complexity, and uses a dataflow model to create four stages that read, compute, scale, and write results. All stages are designed to support TensorFlow Lite operations including asymmetric quantized activations, column-major matrix write, per-filter/per-axis bias values, and current scaling specifications. The configurable accelerator is integrated with the TensorFlow Lite inference engine running on the ARMv8 processor. We compare performance/power/energy with the state-of-the-art RUY software multiplication library showing up to 18× acceleration and 48× in dense and sparse modes, respectively. The sparse mode benefits from structural pruning to fully utilize the digital signal processing blocks present in the FPGA device. [ABSTRACT FROM AUTHOR]