Rationalization through Concepts

Authors :: Antognini, Diego Matteo
Faltings, Boi
Abstract: Automated predictions require explanations to be interpretable by humans. One type of ex- planation is a rationale, i.e., a selection of in- put features such as relevant text snippets from which the model computes the outcome. How- ever, a single overall selection does not pro- vide a complete explanation, e.g., weighing several aspects for decisions. To this end, we present a novel self-interpretable model called ConRAT. Inspired by how human explanations for high-level decisions are often based on key concepts, ConRAT extracts a set of text snip- pets as concepts and infers which ones are de- scribed in the document. Then, it explains the outcome with a linear aggregation of concepts. Two regularizers drive ConRAT to build in- terpretable concepts. In addition, we propose two techniques to boost the rationale and pre- dictive performance further. Experiments on both single- and multi-aspect sentiment classi- fication tasks show that ConRAT is the first to generate concepts that align with human ratio- nalization while using only the overall label. Further, it outperforms state-of-the-art meth- ods trained on each aspect label independently.

Tools