Back to Search Start Over

Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

Authors :
Hu, Jerry Yao-Chieh
Chang, Pei-Hsuan
Luo, Robin
Chen, Hong-Yu
Li, Weijian
Wang, Wei-Po
Liu, Han
Publication Year :
2024

Abstract

We introduce an Outlier-Efficient Modern Hopfield Model (termed $\mathrm{OutEffHop}$) and use it to address the outlier inefficiency problem of {training} gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating \textit{outlier-efficient} associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an outlier-efficient attention mechanism (${\rm Softmax}_1$): it is an approximation of the memory retrieval process of $\mathrm{OutEffHop}$. Methodologically, this allows us to introduce novel outlier-efficient Hopfield layers as powerful alternatives to traditional attention mechanisms, with superior post-quantization performance. Theoretically, the Outlier-Efficient Modern Hopfield Model retains and improves the desirable properties of standard modern Hopfield models, including fixed point convergence and exponential storage capacity. Empirically, we demonstrate the efficacy of the proposed model across large-scale transformer-based and Hopfield-based models (including BERT, OPT, ViT, and STanHop-Net), benchmarking against state-of-the-art methods like $\mathtt{Clipped\_Softmax}$ and $\mathtt{Gated\_Attention}$. Notably, $\mathrm{OutEffHop}$ achieves an average reduction of 22+\% in average kurtosis and 26+\% in the maximum infinity norm of model outputs across four models. Code is available at \href{https://github.com/MAGICS-LAB/OutEffHop}{GitHub}; models are on \href{https://huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f}{Hugging Face Hub}; future updates are on \href{https://arxiv.org/abs/2404.03828}{arXiv}.<br />Comment: Accepted at ICML 2024; v2 updated to camera-ready version; Code available at https://github.com/MAGICS-LAB/OutEffHop; Models are on Hugging Face: https://huggingface.co/collections/magicslabnu/outeffhop-6610fcede8d2cda23009a98f

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2404.03828
Document Type :
Working Paper