CLIP-Embed-KD: Computationally Efficient Knowledge Distillation Using Embeddings as Teachers

Authors :: Nair, Lakshmi
Source :: Extended abstract: 28th IEEE High Performance Extreme Computing Conference (HPEC) 2024 - Outstanding short paper award
Publication Year :: 2024
Abstract: Contrastive Language-Image Pre-training (CLIP) has been shown to improve zero-shot generalization capabilities of language and vision models. In this paper, we extend CLIP for efficient knowledge distillation, by utilizing embeddings as teachers. Typical knowledge distillation frameworks require running forward passes through a teacher model, which is often prohibitive in the case of billion or trillion parameter teachers. In these cases, using only the embeddings of the teacher models to guide the distillation can yield significant computational savings. Our preliminary findings show that CLIP-based knowledge distillation with embeddings can outperform full scale knowledge distillation using $9\times$ less memory and $8\times$ less training time. Code available at: https://github.com/lnairGT/CLIP-Distillation/<br />Comment: Short paper - 5 pages; 5 figures

Subjects :: Computer Science - Machine Learning
Computer Science - Artificial Intelligence

Database :: arXiv
Journal :: Extended abstract: 28th IEEE High Performance Extreme Computing Conference (HPEC) 2024 - Outstanding short paper award
Publication Type :: Report
Accession number :: edsarx.2404.06170
Document Type :: Working Paper

Tools