1. On the Effectiveness of Distillation in Mitigating Backdoors in Pre-trained Encoder
- Author
-
Han, Tingxu, Huang, Shenghan, Ding, Ziqi, Sun, Weisong, Feng, Yebo, Fang, Chunrong, Li, Jun, Qian, Hanwei, Wu, Cong, Zhang, Quanjun, Liu, Yang, Chen, Zhenyu, Han, Tingxu, Huang, Shenghan, Ding, Ziqi, Sun, Weisong, Feng, Yebo, Fang, Chunrong, Li, Jun, Qian, Hanwei, Wu, Cong, Zhang, Quanjun, Liu, Yang, and Chen, Zhenyu
- Abstract
In this paper, we study a defense against poisoned encoders in SSL called distillation, which is a defense used in supervised learning originally. Distillation aims to distill knowledge from a given model (a.k.a the teacher net) and transfer it to another (a.k.a the student net). Now, we use it to distill benign knowledge from poisoned pre-trained encoders and transfer it to a new encoder, resulting in a clean pre-trained encoder. In particular, we conduct an empirical study on the effectiveness and performance of distillation against poisoned encoders. Using two state-of-the-art backdoor attacks against pre-trained image encoders and four commonly used image classification datasets, our experimental results show that distillation can reduce attack success rate from 80.87% to 27.51% while suffering a 6.35% loss in accuracy. Moreover, we investigate the impact of three core components of distillation on performance: teacher net, student net, and distillation loss. By comparing 4 different teacher nets, 3 student nets, and 6 distillation losses, we find that fine-tuned teacher nets, warm-up-training-based student nets, and attention-based distillation loss perform best, respectively.
- Published
- 2024