Back to Search
Start Over
BLAS3 optimization for the Godson-3B1500
- Source :
- SpringerPlus
- Publication Year :
- 2016
- Publisher :
- Springer Science and Business Media LLC, 2016.
-
Abstract
- This paper proposes a performance model for general matrix multiplication (GEMM) on decoupled access/execute (DAE) architecture platforms, in order to guide improvements of the GEMM performance in the Godson-3B1500. This model focuses on the features of access processors (APs) and execute processors (EPs). To reduce the synchronization overhead between APs and EPs, a synchronization module selection mechanism (SMSM) is presented. Furthermore, two optimized algorithms of GEMM for DAE platforms based on the performance model are proposed for ideal performance. In the proposed algorithms, the kernel functions are optimized with single instruction multiple data (SIMD) vector instructions, and the overhead of AP is almost overlapped with EP by taking full advantage of the features of the architecture. Moreover, the synchronization overhead can be reduced according to the SMSM. In the end, the proposed algorithms are tested on the Godson-3B1500. The experimental results demonstrate that the computing performance of dGEMM reaches 91.9% of the theoretical peak performance and that zGEMM can reach 93% of the theoretical peak performance.
- Subjects :
- Multidisciplinary
Computer science
Research
020207 software engineering
010103 numerical & computational mathematics
02 engineering and technology
Parallel computing
01 natural sciences
Basic Linear Algebra Subprograms
BLAS
Godson-3B1500
Synchronization (computer science)
0202 electrical engineering, electronic engineering, information engineering
Overhead (computing)
Multiplication
General matrix
SIMD
DAE
0101 mathematics
Performance model
Performance optimization
Subjects
Details
- ISSN :
- 21931801
- Volume :
- 5
- Database :
- OpenAIRE
- Journal :
- SpringerPlus
- Accession number :
- edsair.doi.dedup.....c591d5f98fd17e23dfb92c4b47100b1b