1. IECA: An In-Execution Configuration CNN Accelerator With 30.55 GOPS/mm² Area Efficiency.
- Author
-
Huang, Boming, Huan, Yuxiang, Chu, Haoming, Xu, Jiawei, Liu, Lizheng, Zheng, Lirong, and Zou, Zhuo
- Subjects
- *
CONVOLUTIONAL neural networks , *FINITE state machines , *TILES - Abstract
It remains challenging for a Convolutional Neural Network (CNN) accelerator to maintain high hardware utilization and low processing latency with restricted on-chip memory. This paper presents an In-Execution Configuration Accelerator (IECA) that realizes an efficient control scheme, exploring architectural data reuse, unified in-execution controlling, and pipelined latency hiding to minimize configuration overhead out of the computation scope. The proposed IECA achieves row-wise convolution with tiny distributed buffers and reduces the size of total on-chip memory by removing 40% of redundant memory storage with shared delay chains. By exploiting a reconfigurable Sequence Mapping Table (SMT) and Finite State Machine (FSM) control, the chip realizes cycle-accurate Processing Element (PE) control, automatic loop tiling and latency hiding without extra time slots for pre-configuration. Evaluated on AlexNet and VGG-16, the IECA retains over 97.3% PE utilization and over 95.6% memory access time hiding on average. The chip is designed and fabricated in a UMC 55-nm process running at a frequency of 250 MHz and achieves an area efficiency of 30.55 GOPS/mm2 and 0.244 GOPS/KGE (kilo-gate-equivalent), which makes an over $2.0\times $ and $2.1\times $ improvement, respectively, compared with that of previous related works. Implementation of the IEC control scheme uses only a 0.55% area of the 2.75 mm2 core. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF