1. Configurable Multi-directional Systolic Array Architecture for Convolutional Neural Networks
- Author
-
Yang Guo, Yaohua Wang, Rui Xu, Sheng Ma, and Xinhai Chen
- Subjects
010302 applied physics ,Speedup ,business.industry ,Dataflow ,Computer science ,Systolic array ,02 engineering and technology ,Energy consumption ,01 natural sciences ,Convolutional neural network ,020202 computer hardware & architecture ,Convolution ,Transmission (telecommunications) ,Hardware and Architecture ,0103 physical sciences ,0202 electrical engineering, electronic engineering, information engineering ,Hardware acceleration ,business ,Software ,Computer hardware ,Information Systems - Abstract
The systolic array architecture is one of the most popular choices for convolutional neural network hardware accelerators. The biggest advantage of the systolic array architecture is its simple and efficient design principle. Without complicated control and dataflow, hardware accelerators with the systolic array can calculate traditional convolution very efficiently. However, this advantage also brings new challenges to the systolic array. When computing special types of convolution, such as the small-scale convolution or depthwise convolution, the processing element (PE) utilization rate of the array decreases sharply. The main reason is that the simple architecture design limits the flexibility of the systolic array. In this article, we design a configurable multi-directional systolic array (CMSA) to address these issues. First, we added a data path to the systolic array. It allows users to split the systolic array through configuration to speed up the calculation of small-scale convolution. Second, we redesigned the PE unit so that the array has multiple data transmission modes and dataflow strategies. This allows users to switch the dataflow of the PE array to speed up the calculation of depthwise convolution. In addition, unlike other works, we only make a few changes and modifications to the existing systolic array architecture. It avoids additional hardware overheads and can be easily deployed in application scenarios that require small systolic arrays such as mobile terminals. Based on our evaluation, CMSA can increase the PE utilization rate by up to 1.6 times compared to the typical systolic array when running the last layers of ResNet-18. When running depthwise convolution in MobileNet, CMSA can increase the utilization rate by up to 14.8 times. At the same time, CMSA and the traditional systolic arrays are similar in area and energy consumption.
- Published
- 2021
- Full Text
- View/download PDF