VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs

Authors :: Jeong, Geonhwa
Damani, Sana
Bambhaniya, Abhimanyu Rajeshkumar
Qin, Eric
Hughes, Christopher J.
Subramoney, Sreenivas
Kim, Hyesoon
Krishna, Tushar
Publication Year :: 2023
Abstract: Deep Learning (DL) acceleration support in CPUs has recently gained a lot of traction, with several companies (Arm, Intel, IBM) announcing products with specialized matrix engines accessible via GEMM instructions. CPUs are pervasive and need to handle diverse requirements across DL workloads running in edge/HPC/cloud platforms. Therefore, as DL workloads embrace sparsity to reduce the computations and memory size of models, it is also imperative for CPUs to add support for sparsity to avoid under-utilization of the dense matrix engine and inefficient usage of the caches and registers. This work presents VEGETA, a set of ISA and microarchitecture extensions over dense matrix engines to support flexible structured sparsity for CPUs, enabling programmable support for diverse DL models with varying degrees of sparsity. Compared to the state-of-the-art (SOTA) dense matrix engine in CPUs, a VEGETA engine provides 1.09x, 2.20x, 3.74x, and 3.28x speed-ups when running 4:4 (dense), 2:4, 1:4, and unstructured (95%) sparse DNN layers.<br />Comment: This paper is accepted to HPCA 2023

Subjects :: Computer Science - Hardware Architecture
Computer Science - Artificial Intelligence
Computer Science - Machine Learning

Tools