Back to Search Start Over

On the evaluation of deep learning interpretability methods for medical images under the scope of faithfulness.

Authors :
Lamprou, Vangelis
Kallipolitis, Athanasios
Maglogiannis, Ilias
Source :
Computer Methods & Programs in Biomedicine. Aug2024, Vol. 253, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

Evaluating the interpretability of Deep Learning models is crucial for building trust and gaining insights into their decision-making processes. In this work, we employ class activation map based attribution methods in a setting where only High-Resolution Class Activation Mapping (HiResCAM) is known to produce faithful explanations. The objective is to evaluate the quality of the attribution maps using quantitative metrics and investigate whether faithfulness aligns with the metrics results. We fine-tune pre-trained deep learning architectures over four medical image datasets in order to calculate attribution maps. The maps are evaluated on a threefold metrics basis utilizing well-established evaluation scores. Our experimental findings suggest that the Area Over Perturbation Curve (AOPC) and Max-Sensitivity scores favor the HiResCAM maps. On the other hand, the Heatmap Assisted Accuracy Score (HAAS) does not provide insights to our comparison as it evaluates almost all maps as inaccurate. To this purpose we further compare our calculated values against values obtained over a diverse group of models which are trained on non-medical benchmark datasets, to eventually achieve more responsive results. This study develops a series of experiments to discuss the connection between faithfulness and quantitative metrics over medical attribution maps. HiResCAM preserves the gradient effect on a pixel level ultimately producing high-resolution, informative and resilient mappings. In turn, this is depicted in the results of AOPC and Max-Sensitivity metrics, successfully identifying the faithful algorithm. In regards to HAAS, our experiments yield that it is sensitive over complex medical patterns, commonly characterized by strong color dependency and multiple attention areas. • A framework for evaluating interpretability methods, under faithfulness, is proposed. • The proposed algorithm is applied to four medical imaging datasets. • Area over Perturbation Curve and Max-Sensitivity favor the faithful method • Heatmap Assisted Accuracy Score fails to identify the faithful method • Heatmap Assisted Accuracy Score is sensitive over complex patterns characterizing medical images. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01692607
Volume :
253
Database :
Academic Search Index
Journal :
Computer Methods & Programs in Biomedicine
Publication Type :
Academic Journal
Accession number :
177746849
Full Text :
https://doi.org/10.1016/j.cmpb.2024.108238