Back to Search Start Over

CPSAA: Accelerating Sparse Attention Using Crossbar-Based Processing-In-Memory Architecture

Authors :
Li, Huize
Jin, Hai
Zheng, Long
Liao, Xiaofei
Huang, Yu
Liu, Cong
Xu, Jiahong
Duan, Zhuohui
Chen, Dan
Gui, Chuangyi
Source :
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems; 2024, Vol. 43 Issue: 6 p1741-1754, 14p
Publication Year :
2024

Abstract

The attention-based neural network attracts great interest due to its excellent accuracy enhancement. However, the attention mechanism requires huge computational efforts to process unnecessary calculations, significantly limiting the system’s performance. To reduce the unnecessary calculations, researchers propose sparse attention to convert some dense-dense matrices multiplication (DDMM) operations to sampled dense–dense matrix multiplication (SDDMM) and sparse matrix multiplication (SpMM) operations. However, current sparse attention solutions introduce massive off-chip random memory access since the sparse attention matrix is generally unstructured. We propose CPSAA, a novel crossbar-based processing-in-memory (PIM)-featured sparse attention accelerator to eliminate off-chip data transmissions. 1) We present a novel attention calculation mode to balance the crossbar writing and crossbar processing latency. 2) We design a novel PIM-based sparsity pruning architecture to eliminate the pruning phase’s off-chip data transfers. 3) Finally, we present novel crossbar-based SDDMM and SpMM methods to process unstructured sparse attention matrices by coupling two types of crossbar arrays. Experimental results show that CPSAA has an average of <inline-formula> <tex-math notation="LaTeX">$89.6\times $ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$32.2\times $ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$17.8\times $ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$3.39\times $ </tex-math></inline-formula>, and <inline-formula> <tex-math notation="LaTeX">$3.84\times $ </tex-math></inline-formula> performance improvement and <inline-formula> <tex-math notation="LaTeX">$755.6\times $ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$55.3\times $ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$21.3\times $ </tex-math></inline-formula>, <inline-formula> <tex-math notation="LaTeX">$5.7\times $ </tex-math></inline-formula>, and <inline-formula> <tex-math notation="LaTeX">$4.9\times $ </tex-math></inline-formula> energy-saving when compare with GPU, field programmable gate array, SANGER, ReBERT, and ReTransformer.

Details

Language :
English
ISSN :
02780070
Volume :
43
Issue :
6
Database :
Supplemental Index
Journal :
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Publication Type :
Periodical
Accession number :
ejs66457186
Full Text :
https://doi.org/10.1109/TCAD.2023.3344524