1. Explanation leaks: Explanation-guided model extraction attacks.
- Author
-
Yan, Anli, Huang, Teng, Ke, Lishan, Liu, Xiaozhang, Chen, Qi, and Dong, Changyu
- Subjects
- *
MACHINE learning , *ARTIFICIAL intelligence , *INFORMATION modeling , *EXPLANATION , *ORGANIZATIONAL transparency - Abstract
Explainable artificial intelligence (XAI) is gradually becoming a key component of many artificial intelligence systems. However, such pursuit of transparency may bring potential privacy threats to the model confidentially, as the adversary may obtain more critical information about the model. In this paper, we systematically study how model decision explanations impact model extraction attacks, which aim at stealing the functionalities of a black-box model. Based on the threat models we formulated, an XAI-aware model extraction attack (XaMEA), a novel attack framework that exploits spatial knowledge from decision explanations is proposed. XaMEA is designed to be model-agnostic: it achieves considerable extraction fidelity on arbitrary machine learning (ML) models. Moreover, we proved that this attack is inexorable, even if the target model does not proactively provide model explanations. Various empirical results have also verified the effectiveness of XaMEA and disclosed privacy leakages caused by decision explanations. We hope this work would highlight the need for techniques that better trade off the transparency and privacy of ML models. • We propose XaMEA, three XAI-aware model extraction attack architectures. • We further carry out XAI-aware model extraction attacks against non-explanation target models. • We evaluate the attack effectiveness of XaMEA with an exhaustive set of experiments. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF