Author: "Cong, Rongqing" - Searchworks@Jio Institute Digital Library Search Results

Searchworks

Author: Cong, Rongqing, He, Wenyang, Li, Mingxuan, Luo, Bangning, Yang, Zebin, Yang, Yuchao, Huang, Ru, and Yan, Bonan
Subjects: Computer Science - Hardware Architecture, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Large language models (LLMs) with Transformer architectures have become phenomenal in natural language processing, multimodal generative artificial intelligence, and agent-oriented artificial intelligence. The self-attention module is the most dominating sub-structure inside Transformer-based LLMs. Computation using general-purpose graphics processing units (GPUs) inflicts reckless demand for I/O bandwidth for transferring intermediate calculation results between memories and processing units. To tackle this challenge, this work develops a fully customized vanilla self-attention accelerator, AttentionLego, as the basic building block for constructing spatially expandable LLM processors. AttentionLego provides basic implementation with fully-customized digital logic incorporating Processing-In-Memory (PIM) technology. It is based on PIM-based matrix-vector multiplication and look-up table-based Softmax design. The open-source code is available online: https://bonany.cc/attentionleg., Comment: for associated source codes, see https://bonany.cc/attentionleg
Published: 2024

Searchworks