AttentionLego: An Open-Source Building Block For Spatially-Scalable Large Language Model Accelerator With Processing-In-Memory Technology

Authors :: Cong, Rongqing
He, Wenyang
Li, Mingxuan
Luo, Bangning
Yang, Zebin
Yang, Yuchao
Huang, Ru
Yan, Bonan
Cong, Rongqing
He, Wenyang
Li, Mingxuan
Luo, Bangning
Yang, Zebin
Yang, Yuchao
Huang, Ru
Yan, Bonan
Publication Year :: 2024
Abstract: Large language models (LLMs) with Transformer architectures have become phenomenal in natural language processing, multimodal generative artificial intelligence, and agent-oriented artificial intelligence. The self-attention module is the most dominating sub-structure inside Transformer-based LLMs. Computation using general-purpose graphics processing units (GPUs) inflicts reckless demand for I/O bandwidth for transferring intermediate calculation results between memories and processing units. To tackle this challenge, this work develops a fully customized vanilla self-attention accelerator, AttentionLego, as the basic building block for constructing spatially expandable LLM processors. AttentionLego provides basic implementation with fully-customized digital logic incorporating Processing-In-Memory (PIM) technology. It is based on PIM-based matrix-vector multiplication and look-up table-based Softmax design. The open-source code is available online: https://bonany.cc/attentionleg.<br />Comment: for associated source codes, see https://bonany.cc/attentionleg

Tools