Back to Search
Start Over
A Framework to Learn with Interpretation
- Source :
- Thirty-Fifth Annual Conference on Neural Information Processing Systems (NeurIPS 2021), Thirty-Fifth Annual Conference on Neural Information Processing Systems (NeurIPS 2021), Dec 2021, Sydney, Australia
- Publication Year :
- 2021
- Publisher :
- HAL CCSD, 2021.
-
Abstract
- International audience; To tackle interpretability in deep learning, we present a novel framework to jointly learn a predictive model and its associated interpretation model. The interpreter provides both local and global interpretability about the predictive model in terms of human-understandable high level attribute functions, with minimal loss of accuracy. This is achieved by a dedicated architecture and well chosen regularization penalties. We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers and whose outputs feed a linear classifier. We impose strong conciseness on the activation of attributes with an entropy-based criterion while enforcing fidelity to both inputs and outputs of the predictive model. A detailed pipeline to visualize the learnt features is also developed. Moreover, besides generating interpretable models by design, our approach can be specialized to provide post-hoc interpretations for a pre-trained neural network. We validate our approach against several state-of-the-art methods on multiple datasets and show its efficacy on both kinds of tasks.
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Thirty-Fifth Annual Conference on Neural Information Processing Systems (NeurIPS 2021), Thirty-Fifth Annual Conference on Neural Information Processing Systems (NeurIPS 2021), Dec 2021, Sydney, Australia
- Accession number :
- edsair.doi.dedup.....69005442e45199e0a33aacde38b42719