1. Energy Transformer
- Author
-
Hoover, Benjamin, Liang, Yuchen, Pham, Bao, Panda, Rameswar, Strobelt, Hendrik, Chau, Duen Horng, Zaki, Mohammed J., and Krotov, Dmitry
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Quantitative Biology - Neurons and Cognition ,Computer Vision and Pattern Recognition (cs.CV) ,FOS: Biological sciences ,Computer Science - Computer Vision and Pattern Recognition ,FOS: Physical sciences ,Neurons and Cognition (q-bio.NC) ,Machine Learning (stat.ML) ,Disordered Systems and Neural Networks (cond-mat.dis-nn) ,Condensed Matter - Disordered Systems and Neural Networks ,Machine Learning (cs.LG) - Abstract
Transformers have become the de facto models of choice in machine learning, typically leading to impressive performance on many applications. At the same time, the architectural development in the transformer world is mostly driven by empirical findings, and the theoretical understanding of their architectural building blocks is rather limited. In contrast, Dense Associative Memory models or Modern Hopfield Networks have a well-established theoretical foundation, but have not yet demonstrated truly impressive practical results. We propose a transformer architecture that replaces the sequence of feedforward transformer blocks with a single large Associative Memory model. Our novel architecture, called Energy Transformer (or ET for short), has many of the familiar architectural primitives that are often used in the current generation of transformers. However, it is not identical to the existing architectures. The sequence of transformer layers in ET is purposely designed to minimize a specifically engineered energy function, which is responsible for representing the relationships between the tokens. As a consequence of this computational principle, the attention in ET is different from the conventional attention mechanism. In this work, we introduce the theoretical foundations of ET, explore it's empirical capabilities using the image completion task, and obtain strong quantitative results on the graph anomaly detection task.
- Published
- 2023
- Full Text
- View/download PDF