1. Quantized CNN: A Unified Approach to Accelerate and Compress Convolutional Networks.
- Author
-
Cheng, Jian, Wu, Jiaxiang, Leng, Cong, Wang, Yuhang, and Hu, Qinghao
- Subjects
- *
ARTIFICIAL neural networks , *MACHINE learning , *GEOMETRIC quantization - Abstract
We are witnessing an explosive development and widespread application of deep neural networks (DNNs) in various fields. However, DNN models, especially a convolutional neural network (CNN), usually involve massive parameters and are computationally expensive, making them extremely dependent on high-performance hardware. This prohibits their further extensions, e.g., applications on mobile devices. In this paper, we present a quantized CNN, a unified approach to accelerate and compress convolutional networks. Guided by minimizing the approximation error of individual layer’s response, both fully connected and convolutional layers are carefully quantized. The inference computation can be effectively carried out on the quantized network, with much lower memory and storage consumption. Quantitative evaluation on two publicly available benchmarks demonstrates the promising performance of our approach: with comparable classification accuracy, it achieves 4 to $6 \times $ acceleration and 15 to $20\times $ compression. With our method, accurate image classification can even be directly carried out on mobile devices within 1 s. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF