1. Aggressive Post-Training Compression on Extremely Large Language Models
- Author
-
Zhang, Zining, Chen, Yao, He, Bingsheng, and Zhang, Zhenjie
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
The increasing size and complexity of Large Language Models (LLMs) pose challenges for their deployment on personal computers and mobile devices. Aggressive post-training model compression is necessary to reduce the models' size, but it often results in significant accuracy loss. To address this challenge, we propose a novel network pruning technology that utilizes over 0.7 sparsity and less than 8 bits of quantization. Our approach enables the compression of prevailing LLMs within a couple of hours while maintaining a relatively small accuracy loss. In experimental evaluations, our method demonstrates effectiveness and potential for practical deployment. By making LLMs available on domestic devices, our work can facilitate a new era of natural language processing applications with wide-ranging impacts.
- Published
- 2024