Back to Search Start Over

SAKD: Sparse attention knowledge distillation.

Authors :
Guo, Zhen
Zhang, Pengzhou
Liang, Peng
Source :
Image & Vision Computing. Jun2024, Vol. 146, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

Deep learning techniques have gained significant interest due to their success in large model scenarios. However, large models often require massive computational resources, which can challenge end devices with limited storage capabilities. Transferring knowledge from big to small models and achieving similar results with limited resources requires further research. Knowledge distillation techniques, which involve using teacher-student models to migrate large model capabilities to small models, have been widely used in model compression and knowledge transfer. In this paper, a novel knowledge distillation approach is proposed, which utilizes the sparse attention mechanism (SAKD). SAKD computes attention using student features as queries and teacher features as key values and performs sparse attention values by random deactivation. Then, this sparse attention value is used to reweight the feature distance of each teacher-student feature pair to avoid negative transfer. Comprehensive experiments demonstrate the effectiveness and generality of our approach. Moreover, our SAKD method outperforms previous state-of-the-art methods on image classification tasks. • Propose a novel knowledge distillation approach utilizing the sparse attention mechanism (SAKD). • Use student features as queries and teacher features as key values and performs sparse attention values by random deactivation. • Use sparse attention value to reweight the feature distance of each teacher-student feature pair to avoid negative transfer. • SAKD method outperforms previous state-of-the-art methods on image classification tasks (CIFAR-100 and ImageNet). [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
02628856
Volume :
146
Database :
Academic Search Index
Journal :
Image & Vision Computing
Publication Type :
Academic Journal
Accession number :
177372772
Full Text :
https://doi.org/10.1016/j.imavis.2024.105020