1. Advances in Pruning and Quantization for Natural Language Processing
- Author
-
Ummara Bibi, Mahrukh Mazhar, Dilshad Sabir, Muhammad Fasih Uddin Butt, Ali Hassan, Mustansar Ali Ghazanfar, Arshad Ali Khan, and Wadood Abdul
- Subjects
Convolutional neural network ,natural language processing ,quantization ,NLP models ,compression ,pruning ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
With ongoing advancements in natural language processing (NLP) and deep learning methods, the demand for computational and memory resources has considerably increased, which signifies the determination of efficient and compact models in resource-constrained environments. A comprehensive overview of the most recent advancements in pruning and quantization methods for deep neural networks is provided in this paper. Numerous cutting-edge techniques that harness the complementary advantages of pruning and quantization have been analyzed, highlighting their effectiveness in reducing model size, enhancing computational efficiency, and minimizing memory usage. These techniques include Quantization and Sparsity Aware Fine Tuning, Compression Learning by In-Parallel Pruning-Quantization, GroupReduce, Quantization-Pruned Attention, Structured Pruning, Normalized Linear Quantization (Prune and NLQ), Quantization and Pruning for Sentiment Analysis, an Automatic mixed-precision Quantization approach for BERT compression (AQ-BERT), Mixed Precision Quantization, Unstructured Pruning, and Quantization, and Magnitude Pruning. The datasets utilized, models employed, and outcomes achieved are taken into account within this research. The utilization of pruning and quantization techniques across diverse deep-learning tasks, NLP, and sentiment analysis are also discussed. Moreover, issues such as compatibility with hardware configurations, optimization complexities, accuracy degradation, and other constraints have been analysed. Several challenges and limitations of weight or unit pruning that are utilized for optimizing memory and quantization techniques to enhance precision are explored. The in-depth analysis of these state-of-the-art techniques and experiments provides a broad understanding. Furthermore, strategies to effectively reduce the computational and memory demands of neural networks without compromising their performance are also analysed.
- Published
- 2024
- Full Text
- View/download PDF