Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens

Authors :: Luo, Weiyao
Zheng, Suncong
Xia, Heming
Wang, Weikang
Lei, Yan
Liu, Tianyu
Chen, Shuang
Sui, Zhifang
Publication Year :: 2024
Abstract: Large language models (LLMs) have shown promising efficacy across various tasks, becoming powerful tools in numerous aspects of human life. However, Transformer-based LLMs suffer a performance degradation when modeling long-term contexts due to they discard some information to reduce computational overhead. In this work, we propose a simple yet effective method to enable LLMs to take a deep breath, encouraging them to summarize information contained within discrete text chunks. Specifically, we segment the text into multiple chunks and insert special token <SR> at the end of each chunk. We then modify the attention mask to integrate the chunk's information into the corresponding <SR> token. This facilitates LLMs to interpret information not only from historical individual tokens but also from the <SR> token, aggregating the chunk's semantic information. Experiments on language modeling and out-of-domain downstream tasks validate the superiority of our approach.

Tools