1. Intersection of Performance, Interpretability, and Fairness in Neural Prototype Tree for Chest X-Ray Pathology Detection: Algorithm Development and Validation Study
- Author
-
Hongbo Chen, Myrtede Alfred, Andrew D Brown, Angela Atinga, and Eldan Cohen
- Subjects
Medicine - Abstract
BackgroundWhile deep learning classifiers have shown remarkable results in detecting chest X-ray (CXR) pathologies, their adoption in clinical settings is often hampered by the lack of transparency. To bridge this gap, this study introduces the neural prototype tree (NPT), an interpretable image classifier that combines the diagnostic capability of deep learning models and the interpretability of the decision tree for CXR pathology detection. ObjectiveThis study aimed to investigate the utility of the NPT classifier in 3 dimensions, including performance, interpretability, and fairness, and subsequently examined the complex interaction between these dimensions. We highlight both local and global explanations of the NPT classifier and discuss its potential utility in clinical settings. MethodsThis study used CXRs from the publicly available Chest X-ray 14, CheXpert, and MIMIC-CXR datasets. We trained 6 separate classifiers for each CXR pathology in all datasets, 1 baseline residual neural network (ResNet)–152, and 5 NPT classifiers with varying levels of interpretability. Performance, interpretability, and fairness were measured using the area under the receiver operating characteristic curve (ROC AUC), interpretation complexity (IC), and mean true positive rate (TPR) disparity, respectively. Linear regression analyses were performed to investigate the relationship between IC and ROC AUC, as well as between IC and mean TPR disparity. ResultsThe performance of the NPT classifier improved as the IC level increased, surpassing that of ResNet-152 at IC level 15 for the Chest X-ray 14 dataset and IC level 31 for the CheXpert and MIMIC-CXR datasets. The NPT classifier at IC level 1 exhibited the highest degree of unfairness, as indicated by the mean TPR disparity. The magnitude of unfairness, as measured by the mean TPR disparity, was more pronounced in groups differentiated by age (chest X-ray 14 0.112, SD 0.015; CheXpert 0.097, SD 0.010; MIMIC 0.093, SD 0.017) compared to sex (chest X-ray 14 0.054 SD 0.012; CheXpert 0.062, SD 0.008; MIMIC 0.066, SD 0.013). A significant positive relationship between interpretability (ie, IC level) and performance (ie, ROC AUC) was observed across all CXR pathologies (P
- Published
- 2024
- Full Text
- View/download PDF