Back to Search
Start Over
基于申威众核架构的分组卷积计算加速与优化.
- Source :
-
Application Research of Computers / Jisuanji Yingyong Yanjiu . Jun2023, Vol. 40 Issue 6, p1745-1749. 5p. - Publication Year :
- 2023
-
Abstract
- In order to solve the problems of high computational complexity, large computational cost and large number of parameters, this paper proposed the parallel group convolution algorithm based on the domestic SW26010P multi-core processor. The core idea was to use the unique data layout, through the multi-core mapping processing, parallel computing. Experimental results show that compared with single-core serial algorithm, the proposed parallel group convolution algorithm can achieve the highest speed-up ratio of 79.5 and the maximum effective computing power of 186.7MFLOPS. After data parallel optimization of the parallel group convolution algorithm by SIMD instruction, the algorithm obtains the highest speed-up ratio of 10.2 compared with the parallel group convolution algorithm before optimization. [ABSTRACT FROM AUTHOR]
Details
- Language :
- Chinese
- ISSN :
- 10013695
- Volume :
- 40
- Issue :
- 6
- Database :
- Academic Search Index
- Journal :
- Application Research of Computers / Jisuanji Yingyong Yanjiu
- Publication Type :
- Academic Journal
- Accession number :
- 169823958
- Full Text :
- https://doi.org/10.19734/j.issn.1001-3695.2022.10.0559