1. Performance analysis and optimization for SpMV based on aligned storage formats on an ARM processor.
- Author
-
Zhang, Yufeng, Yang, Wangdong, Li, Kenli, Tang, Dahai, and Li, Keqin
- Subjects
- *
ARM microprocessors , *ALGORITHMS , *SCIENTIFIC computing , *ELECTRONIC data processing , *SPARSE matrices , *PARALLEL algorithms - Abstract
• Propose the aligned storage formats ACSR and AELL. • Put forward a parallel SpMV algorithm based on ACSR and AELL formats. • Theoretically analyze the performance of SpMV on ARM processors. • Evaluate the performance of our optimization on Kunpeng 920 processors. Sparse matrix-vector multiplication (SpMV) has always been a hot topic of research for scientific computing and big data processing, but the sparsity and discontinuity of the nonzero elements in a sparse matrix lead to the memory bottleneck of SpMV. In this paper, we propose aligned CSR (ACSR) and aligned ELL (AELL) formats and a parallel SpMV algorithm to utilize NEON SIMD registers on ARM processors. We analyze the impact of SIMD instruction latency, cache access, and cache misses on SpMV with different formats. In the experiments, our SpMV algorithm based on ACSR achieves 1.18x and 1.56x speedup over SpMV based on CSR and SpMV in PETSc, respectively, and AELL achieves 1.21x speedup over ELL. The deviations between the theoretical results and experimental results in the instruction latency and cache access are 10.26% and 10.51% in ACSR and 5.68% and 2.91% in AELL, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF