1. Novel accelerated methods for convolution neural network with matrix core.
- Author
-
Guo, Yijie, Lu, Lu, and Zhu, Songxiang
- Subjects
- *
CONVOLUTIONAL neural networks , *DEEP learning , *PARALLEL programming , *MATRICES (Mathematics) - Abstract
The powerful parallel computing capability of GPU and the development of matrix processing unit in recent years provide more possibilities to improve the performance of convolutional neural network (CNN) on GPU. For the Winograd convolution algorithm, which is the most widely used in CNN and has the best performance, there are already some tuning results, but they all ignore the utilization of the matrix operation unit and fail to make full use of the computing resources of GPU. This paper introduces a single precision accelerated solution on GPU for CNN. According to the indicators of architecture, the optimal data layout, grid division and block division methods are derived. In order to adapt to a variety of padding in practical application, an efficient dynamic scheme for filling is designed, and by the use of matrix cores, a pipeline algorithm with operator fusion is implemented. The deep learning accelerated library MIOpen in AMD is used as the baseline. Taking several convolutional layers of ResNet50 as the experimental input, the evaluation shows that our approach outperforms MIOpen with the speedup of 1.41x on MI210, and reaches 74% of the peak value of single precision calculations. Applying this method to the training and inference of ResNet50, the speedup of 1.68x is obtained. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF