Back to Search Start Over

Unified Scaling-Based Pure-Integer Quantization for Low-Power Accelerator of Complex CNNs

Authors :
Kim, Ali A. Al-Hamid
HyungWon
Source :
Electronics; Volume 12; Issue 12; Pages: 2660
Publication Year :
2023
Publisher :
Multidisciplinary Digital Publishing Institute, 2023.

Abstract

Although optimizing deep neural networks is becoming crucial for deploying the networks on edge AI devices, it faces increasing challenges due to scarce hardware resources in modern IoT and mobile devices. This study proposes a quantization method that can quantize all internal computations and parameters in the memory modification. Unlike most previous methods that primarily focused on relatively simple CNN models for image classification, the proposed method, Unified Scaling-Based Pure-Integer Quantization (USPIQ), can handle more complex CNN models for object detection. USPIQ aims to provide a systematic approach to convert all floating-point operations to pure-integer operations in every model layer. It can significantly reduce the computational overhead and make it more suitable for low-power neural network accelerator hardware consisting of pure-integer datapaths and small memory aimed at low-power consumption and small chip size. The proposed method optimally calibrates the scale parameters for each layer using a subset of unlabeled representative images. Furthermore, we introduce a notion of the Unified Scale Factor (USF), which combines the conventional two-step scaling processes (quantization and dequantization) into a single process for each layer. As a result, it improves the inference speed and the accuracy of the resulting quantized model. Our experiment on YOLOv5 models demonstrates that USPIQ can significantly reduce the on-chip memory for parameters and activation data by ~75% and 43.68%, respectively, compared with the floating-point model. These reductions have been achieved with a minimal loss in mAP@0.5—at most 0.61%. In addition, our proposed USPIQ exhibits a significant improvement in the inference speed compared to ONNX Run-Time quantization, achieving a speedup of 1.64 to 2.84 times. We also demonstrate that USPIQ outperforms the previous methods in terms of accuracy and hardware reduction for 8-bit quantization of all YOLOv5 versions.

Details

Language :
English
ISSN :
20799292
Database :
OpenAIRE
Journal :
Electronics; Volume 12; Issue 12; Pages: 2660
Accession number :
edsair.multidiscipl..1a6208eca67aa5959fc052213046a1a3
Full Text :
https://doi.org/10.3390/electronics12122660