Efficient Transformers: A Survey.

Authors :: TAY, YI
DEHGHANI, MOSTAFA
BAHRI, DARA
METZLER, DONALD
Source :: ACM Computing Surveys. Jul2023, Vol. 55 Issue 6, p1-28. 28p.
Publication Year :: 2023
Abstract: Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision, and reinforcement learning. In the field of natural language processing for example, Transformers have become an indispensable staple in themodern deep learning stack. Recently, a dizzying number of “X-former” models have been proposed—Reformer, Linformer, Performer, Longformer, to name a few—which improve upon the original Transformer architecture, many of which make improvements around computational and memory efficiency.With the aim of helping the avid researcher navigate this flurry, this article characterizes a large and thoughtful selection of recent efficiency-flavored “X-former” models, providing an organized and comprehensive overview of existing work and models across multiple domains. [ABSTRACT FROM AUTHOR]