Back to Search Start Over

AutoTSMM: An Auto-tuning Framework for Building High-Performance Tall-and-Skinny Matrix-Matrix Multiplication on CPUs

Authors :
Li, Chendi
Jia, Haipeng
Cao, Hang
Yao, Jianyu
Shi, Boqian
Xiang, Chunyang
Sun, Jinbo
Lu, Pengqi
Zhang, Yunquan
Publication Year :
2022

Abstract

In recent years, general matrix-matrix multiplication with non-regular-shaped input matrices has been widely used in many applications like deep learning and has drawn more and more attention. However, conventional implementations are not suited for non-regular-shaped matrix-matrix multiplications, and few works focus on optimizing tall-and-skinny matrix-matrix multiplication on CPUs. This paper proposes an auto-tuning framework, AutoTSMM, to build high-performance tall-and-skinny matrix-matrix multiplication. AutoTSMM selects the optimal inner kernels in the install-time stage and generates an execution plan for the pre-pack tall-and-skinny matrix-matrix multiplication in the runtime stage. Experiments demonstrate that AutoTSMM achieves competitive performance comparing to state-of-the-art tall-and-skinny matrix-matrix multiplication. And, it outperforms all conventional matrix-matrix multiplication implementations.<br />Comment: 8 pages, 12 figures, published in IEEE ISPA 2021

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2208.08088
Document Type :
Working Paper
Full Text :
https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00034