Start Over

Unified Scaling-Based Pure-Integer Quantization for Low-Power Accelerator of Complex CNNs

Authors :: Kim, Ali A. Al-Hamid
HyungWon
Source :: Electronics; Volume 12; Issue 12; Pages: 2660
Publication Year :: 2023
Publisher :: Multidisciplinary Digital Publishing Institute, 2023.
Abstract: Although optimizing deep neural networks is becoming crucial for deploying the networks on edge AI devices, it faces increasing challenges due to scarce hardware resources in modern IoT and mobile devices. This study proposes a quantization method that can quantize all internal computations and parameters in the memory modification. Unlike most previous methods that primarily focused on relatively simple CNN models for image classification, the proposed method, Unified Scaling-Based Pure-Integer Quantization (USPIQ), can handle more complex CNN models for object detection. USPIQ aims to provide a systematic approach to convert all floating-point operations to pure-integer operations in every model layer. It can significantly reduce the computational overhead and make it more suitable for low-power neural network accelerator hardware consisting of pure-integer datapaths and small memory aimed at low-power consumption and small chip size. The proposed method optimally calibrates the scale parameters for each layer using a subset of unlabeled representative images. Furthermore, we introduce a notion of the Unified Scale Factor (USF), which combines the conventional two-step scaling processes (quantization and dequantization) into a single process for each layer. As a result, it improves the inference speed and the accuracy of the resulting quantized model. Our experiment on YOLOv5 models demonstrates that USPIQ can significantly reduce the on-chip memory for parameters and activation data by ~75% and 43.68%, respectively, compared with the floating-point model. These reductions have been achieved with a minimal loss in mAP@0.5—at most 0.61%. In addition, our proposed USPIQ exhibits a significant improvement in the inference speed compared to ONNX Run-Time quantization, achieving a speedup of 1.64 to 2.84 times. We also demonstrate that USPIQ outperforms the previous methods in terms of accuracy and hardware reduction for 8-bit quantization of all YOLOv5 versions.

Subjects :: convolutional neural network (CNN)
object detection
weight quantization
unified scaling-based pure-integer quantization (USPIQ)
unified scale factor (USF)
on-chip memory
low-power consumption
ONNX run-time
mAP@0.5
YOLOv5

Details

Language :: English
ISSN :: 20799292
Database :: OpenAIRE
Journal :: Electronics; Volume 12; Issue 12; Pages: 2660
Accession number :: edsair.multidiscipl..1a6208eca67aa5959fc052213046a1a3
Full Text :: https://doi.org/10.3390/electronics12122660

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Unified Scaling-Based Pure-Integer Quantization for Low-Power Accelerator of Complex CNNs

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Unified Scaling-Based Pure-Integer Quantization for Low-Power Accelerator of Complex CNNs

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources