Back to Search Start Over

On the Effectiveness of Distillation in Mitigating Backdoors in Pre-trained Encoder

Authors :
Han, Tingxu
Huang, Shenghan
Ding, Ziqi
Sun, Weisong
Feng, Yebo
Fang, Chunrong
Li, Jun
Qian, Hanwei
Wu, Cong
Zhang, Quanjun
Liu, Yang
Chen, Zhenyu
Han, Tingxu
Huang, Shenghan
Ding, Ziqi
Sun, Weisong
Feng, Yebo
Fang, Chunrong
Li, Jun
Qian, Hanwei
Wu, Cong
Zhang, Quanjun
Liu, Yang
Chen, Zhenyu
Publication Year :
2024

Abstract

In this paper, we study a defense against poisoned encoders in SSL called distillation, which is a defense used in supervised learning originally. Distillation aims to distill knowledge from a given model (a.k.a the teacher net) and transfer it to another (a.k.a the student net). Now, we use it to distill benign knowledge from poisoned pre-trained encoders and transfer it to a new encoder, resulting in a clean pre-trained encoder. In particular, we conduct an empirical study on the effectiveness and performance of distillation against poisoned encoders. Using two state-of-the-art backdoor attacks against pre-trained image encoders and four commonly used image classification datasets, our experimental results show that distillation can reduce attack success rate from 80.87% to 27.51% while suffering a 6.35% loss in accuracy. Moreover, we investigate the impact of three core components of distillation on performance: teacher net, student net, and distillation loss. By comparing 4 different teacher nets, 3 student nets, and 6 distillation losses, we find that fine-tuned teacher nets, warm-up-training-based student nets, and attention-based distillation loss perform best, respectively.

Details

Database :
OAIster
Publication Type :
Electronic Resource
Accession number :
edsoai.on1438533852
Document Type :
Electronic Resource