Author: "Cabitza, F." - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Cabitza, F."' showing total 573 results

Start Over Author "Cabitza, F."

573 results on '"Cabitza, F."'

1. Towards Better Ways to Assess Predictive Computing in Medicine: On Reliability, Robustness, and Utility

Author: Carpentieri, B, Lecca, P, Cabitza, F, Campagner, A, Cabitza F., Campagner A., Carpentieri, B, Lecca, P, Cabitza, F, Campagner, A, Cabitza F., and Campagner A.
Abstract: Computational classification systems built using machine learning (ML) techniques are increasingly being evaluated and employed in medical settings for a number of purposes and applications, including diagnosis, prognosis, and risk stratification. However, evaluation and validation practices that are commonly used and adopted in the application of ML to other disciplines are unlikely to be meaningfully applicable to medicine. In fact, otherwise, technically sound systems have been found to perform poorly in real settings, a concept that has been termed the “last mile of implementation.” In this chapter, we will focus on three main factors underlying the so-called last mile: the impact of observer variability on ground truth reliability; the meaningful and appropriateness of commonly adopted performance measures; and the issue of replicability in ML studies. We will discuss the above mentioned issues, and we will delineate possible solutions and concepts to address them.
Published: 2024

2. Invisible to Machines: Designing AI that Supports Vision Work in Radiology

Author: Anichini, G, Natali, C, Cabitza, F, Anichini G., Natali C., Cabitza F., Anichini, G, Natali, C, Cabitza, F, Anichini G., Natali C., and Cabitza F.
Abstract: In this article we provide an analysis focusing on clinical use of two deep learning-based automatic detection tools in the field of radiology. The value of these technologies conceived to assist the physicians in the reading of imaging data (like X-rays) is generally assessed by the human-machine performance comparison, which does not take into account the complexity of the interpretation process of radiologists in its social, tacit and emotional dimensions. In this radiological vision work, data which informs the physician about the context surrounding a visible anomaly are essential to the definition of its pathological nature. Likewise, experiential data resulting from the contextual tacit knowledge that regulates professional conduct allows for the assessment of an anomaly according to the radiologist's, and patient's, experience. These data, which remain excluded from artificial intelligence processing, question the gap between the norms incorporated by the machine and those leveraged in the daily work of radiologists. The possibility that automated detection may modify the incorporation or the exercise of tacit knowledge raises questions about the impact of AI technologies on medical work. This article aims to highlight how the standards that emerge from the observation practices of radiologists challenge the automation of their vision work, but also under what conditions AI technologies are considered "objective" and trustworthy by professionals.
Published: 2024

3. Three-way decision in machine learning tasks: a systematic review

Author: Campagner, A, Milella, F, Ciucci, D, Cabitza, F, Campagner A., Milella F., Ciucci D., Cabitza F., Campagner, A, Milella, F, Ciucci, D, Cabitza, F, Campagner A., Milella F., Ciucci D., and Cabitza F.
Abstract: In this article, we survey the applications of Three-way decision theory (TWD) in machine learning (ML), focusing in particular on four tasks: weakly supervised learning and multi-source data management, missing data management, uncertainty quantification in classification, and uncertainty quantification in clustering. For each of these four tasks we present the results of a systematic review of the literature, by which we report on the main characteristics of the current state of the art, as well as on the quality of reporting and reproducibility level of the works found in the literature. To this aim, we discuss the main benefits, limitations and issues found in the reviewed articles, and we give clear indications and directions for quality improvement that are informed by validation, reporting, and reproducibility standards, guidelines and best practice that have recently emerged in the ML field. Finally, we discuss about the more promising and relevant directions for future research in regard to TWD.
Published: 2024

4. Dissimilar Similarities: Comparing Human and Statistical Similarity Evaluation in Medical AI

Author: Torra, V, Narukawa, Y, Kikuchi, H, Cabitza, F, Famiglini, L, Campagner, A, Sconfienza, L, Fusco, S, Caccavella, V, Gallazzi, E, Cabitza F., Famiglini L., Campagner A., Sconfienza L. M., Fusco S., Caccavella V., Gallazzi E., Torra, V, Narukawa, Y, Kikuchi, H, Cabitza, F, Famiglini, L, Campagner, A, Sconfienza, L, Fusco, S, Caccavella, V, Gallazzi, E, Cabitza F., Famiglini L., Campagner A., Sconfienza L. M., Fusco S., Caccavella V., and Gallazzi E.
Abstract: This study explores the concept of similarity in machine learning (ML) and its congruence with human judgment in medical contexts, focusing primarily on radiology. We conducted a user study involving two radiologists and two orthopedic and spine surgeons. These experts evaluated the similarity of 72 cases, selected from a larger dataset by an ML model based on Cosine and Euclidean distances, in comparison to 18 representative base cases of vertebral fractures. Our analysis focused on correlating these ML-derived distances with the experts’ assessments. The findings reveal that: (1) both Cosine and Euclidean distances had limited correlation with human judgments; (2) Cosine distances showed a marginally higher correlation than Euclidean distances; despite the limitations due to the small samples of evaluations and evaluators, our findings emphasize the necessity for ongoing research to enhance AI similarity metrics, aiming for greater human-centricity and relevance, particularly considering their critical role in ML training and inference. Our study’s implications are far-reaching, advocating for a comprehensive reevaluation of similarity assessments in AI to achieve a closer alignment with human cognitive processes, extending well beyond the realm of medical imaging.
Published: 2024

5. Sharing reliable information worldwide: healthcare strategies based on artificial intelligence need external validation. Position paper.

Author: Pennestrì, F., Cabitza, F., Picerno, N., and Banfi, G.
Subjects: *MACHINE learning, *ARTIFICIAL intelligence, *ORTHOPEDIC surgery, *COVID-19, *PUBLIC health
Abstract: Training machine learning models using data from severe COVID-19 patients admitted to a central hospital, where entire wards are specifically dedicated to COVID-19, may yield predictions that differ significantly from those generated using data collected from patients admitted to a high-volume specialized hospital for orthopedic surgery, where COVID-19 is only a secondary diagnosis. This disparity arises despite the two hospitals being geographically close (within20 kilometers). While machine learning can facilitate rapid public health responses, rigorous external validation and continuous monitoring are essential to ensure reliability and safety. [ABSTRACT FROM AUTHOR]
Published: 2025
Full Text: View/download PDF

6. The Challenges and Promises of Artifcial Intelligence in the Contemporary Society: A Critical Perspective

Author: Marconi, L, Cabitza, F, Luca Marconi, Federico Cabitza, Marconi, L, Cabitza, F, Luca Marconi, and Federico Cabitza
Abstract: The increasing integration of Artificial Intelligence (AI) into various domains signals a transformative era. This study critically examines the dual-edged nature of AI’s adoption, focusing on the clinical context to highlight the complexity of healthcare and the indispensable role of human expertise. It explores AI’s potential to enhance efficiency and broaden service access, while also cautioning against risks like the skill erosion (deskilling) and overlooking nuanced, non-quantifiable elements of care. Emphasizing a critical, interdisciplinary perspective, this research advocates for balanced AI adoption, mindful of both its advantages and inherent costs, including overlooked externalities. By extending its analysis beyond healthcare, the study underscores the necessity of a thoughtful, multidisciplinary approach in designing and implementing AI technologies. This approach aims to ensure the equitable distribution of AI’s benefits and the thoughtful addressal of its challenges, fostering a society that benefits from AI’s potential while remaining vigilant of its limitations.
Published: 2024

7. Regole per l’intelligenza artificiale

Author: Ricci, S, Rossetti, A, Cabitza, F, Ricci, S, Rossetti, A, and Cabitza, F
Published: 2024

8. Second opinion machine learning for fast-track pathway assignment in hip and knee replacement surgery: the use of patient-reported outcome measures

Author: Campagner, A, Milella, F, Banfi, G, Cabitza, F, Campagner, Andrea, Milella, Frida, Banfi, Giuseppe, Cabitza, Federico, Campagner, A, Milella, F, Banfi, G, Cabitza, F, Campagner, Andrea, Milella, Frida, Banfi, Giuseppe, and Cabitza, Federico
Abstract: Background: The frequency of hip and knee arthroplasty surgeries has been rising steadily in recent decades. This trend is attributed to an aging population, leading to increased demands on healthcare systems. Fast Track (FT) surgical protocols, perioperative procedures designed to expedite patient recovery and early mobilization, have demonstrated efficacy in reducing hospital stays, convalescence periods, and associated costs. However, the criteria for selecting patients for FT procedures have not fully capitalized on the available patient data, including patient-reported outcome measures (PROMs). Methods: Our study focused on developing machine learning (ML) models to support decision making in assigning patients to FT procedures, utilizing data from patients' self-reported health status. These models are specifically designed to predict the potential health status improvement in patients initially selected for FT. Our approach focused on techniques inspired by the concept of controllable AI. This includes eXplainable AI (XAI), which aims to make the model's recommendations comprehensible to clinicians, and cautious prediction, a method used to alert clinicians about potential control losses, thereby enhancing the models' trustworthiness and reliability. Results: Our models were trained and tested using a dataset comprising 899 records from individual patients admitted to the FT program at IRCCS Ospedale Galeazzi-Sant'Ambrogio. After training and selecting hyper-parameters, the models were assessed using a separate internal test set. The interpretable models demonstrated performance on par or even better than the most effective 'black-box' model (Random Forest). These models achieved sensitivity, specificity, and positive predictive value (PPV) exceeding 70%, with an area under the curve (AUC) greater than 80%. The cautious prediction models exhibited enhanced performance while maintaining satisfactory coverage (over 50%). Further, when externally validated on a
Published: 2024

9. Rising adoption of artificial intelligence in scientific publishing: evaluating the role, risks, and ethical implications in paper drafting and review process

Author: Carobene, A, Padoan, A, Cabitza, F, Banfi, G, Plebani, M, Carobene, Anna, Padoan, Andrea, Cabitza, Federico, Banfi, Giuseppe, Plebani, Mario, Carobene, A, Padoan, A, Cabitza, F, Banfi, G, Plebani, M, Carobene, Anna, Padoan, Andrea, Cabitza, Federico, Banfi, Giuseppe, and Plebani, Mario
Abstract: Background: In the rapid evolving landscape of artificial intelligence (AI), scientific publishing is experiencing significant transformations. AI tools, while offering unparalleled efficiencies in paper drafting and peer review, also introduce notable ethical concerns.Content: This study delineates AI's dual role in scientific publishing: as a co-creator in the writing and review of scientific papers and as an ethical challenge. We first explore the potential of AI as an enhancer of efficiency, efficacy, and quality in creating scientific papers. A critical assessment follows, evaluating the risks vs. rewards for researchers, especially those early in their careers, emphasizing the need to maintain a balance between AI's capabilities and fostering independent reasoning and creativity. Subsequently, we delve into the ethical dilemmas of AI's involvement, particularly concerning originality, plagiarism, and preserving the genuine essence of scientific discourse. The evolving dynamics further highlight an overlooked aspect: the inadequate recognition of human reviewers in the academic community. With the increasing volume of scientific literature, tangible metrics and incentives for reviewers are proposed as essential to ensure a balanced academic environment.Summary: AI's incorporation in scientific publishing is promising yet comes with significant ethical and operational challenges. The role of human reviewers is accentuated, ensuring authenticity in an AI-influenced environment.Outlook: As the scientific community treads the path of AI integration, a balanced symbiosis between AI's efficiency and human discernment is pivotal. Emphasizing human expertise, while exploit artificial intelligence responsibly, will determine the trajectory of an ethically sound and efficient AI-augmented future in scientific publishing.
Published: 2024

10. Explainable Artificial Intelligence (XAI) 2.0: A manifesto of open challenges and interdisciplinary research directions

Author: Longo, L, Brcic, M, Cabitza, F, Choi, J, Confalonieri, R, Ser, J, Guidotti, R, Hayashi, Y, Herrera, F, Holzinger, A, Jiang, R, Khosravi, H, Lecue, F, Malgieri, G, Páez, A, Samek, W, Schneider, J, Speith, T, Stumpf, S, Longo, Luca, Brcic, Mario, Cabitza, Federico, Choi, Jaesik, Confalonieri, Roberto, Ser, Javier Del, Guidotti, Riccardo, Hayashi, Yoichi, Herrera, Francisco, Holzinger, Andreas, Jiang, Richard, Khosravi, Hassan, Lecue, Freddy, Malgieri, Gianclaudio, Páez, Andrés, Samek, Wojciech, Schneider, Johannes, Speith, Timo, Stumpf, Simone, Longo, L, Brcic, M, Cabitza, F, Choi, J, Confalonieri, R, Ser, J, Guidotti, R, Hayashi, Y, Herrera, F, Holzinger, A, Jiang, R, Khosravi, H, Lecue, F, Malgieri, G, Páez, A, Samek, W, Schneider, J, Speith, T, Stumpf, S, Longo, Luca, Brcic, Mario, Cabitza, Federico, Choi, Jaesik, Confalonieri, Roberto, Ser, Javier Del, Guidotti, Riccardo, Hayashi, Yoichi, Herrera, Francisco, Holzinger, Andreas, Jiang, Richard, Khosravi, Hassan, Lecue, Freddy, Malgieri, Gianclaudio, Páez, Andrés, Samek, Wojciech, Schneider, Johannes, Speith, Timo, and Stumpf, Simone
Abstract: Understanding black box models has become paramount as systems based on opaque Artificial Intelligence (AI) continue to flourish in diverse real-world applications. In response, Explainable AI (XAI) has emerged as a field of research with practical and ethical benefits across various domains. This paper highlights the advancements in XAI and its application in real-world scenarios and addresses the ongoing challenges within XAI, emphasizing the need for broader perspectives and collaborative efforts. We bring together experts from diverse fields to identify open problems, striving to synchronize research agendas and accelerate XAI in practical applications. By fostering collaborative discussion and interdisciplinary cooperation, we aim to propel XAI forward, contributing to its continued success. We aim to develop a comprehensive proposal for advancing XAI. To achieve this goal, we present a manifesto of 28 open problems categorized into nine categories. These challenges encapsulate the complexities and nuances of XAI and offer a road map for future research. For each problem, we provide promising research directions in the hope of harnessing the collective intelligence of interested stakeholders.
Published: 2024

11. Evidence-based XAI: An empirical approach to design more effective and explainable decision support systems

Author: Famiglini, L, Campagner, A, Barandas, M, La Maida, G, Gallazzi, E, Cabitza, F, La Maida, GA, Famiglini, L, Campagner, A, Barandas, M, La Maida, G, Gallazzi, E, Cabitza, F, and La Maida, GA
Abstract: This paper proposes a user study aimed at evaluating the impact of Class Activation Maps (CAMs) as an eXplainable AI (XAI) method in a radiological diagnostic task, the detection of thoracolumbar (TL) fractures from vertebral X-rays. In particular, we focus on two oft-neglected features of CAMs, that is granularity and coloring, in terms of what features, lower-level vs higher-level, should the maps highlight and adopting which coloring scheme, to bring better impact to the decision-making process, both in terms of diagnostic accuracy (that is effectiveness) and of user-centered dimensions, such as perceived confidence and utility (that is satisfaction), depending on case complexity, AI accuracy, and user expertise. Our findings show that lower-level features CAMs, which highlight more focused anatomical landmarks, are associated with higher diagnostic accuracy than higher-level features CAMs, particularly among experienced physicians. Moreover, despite the intuitive appeal of semantic CAMs, traditionally colored CAMs consistently yielded higher diagnostic accuracy across all groups. Our results challenge some prevalent assumptions in the XAI field and emphasize the importance of adopting an evidence-based and human-centered approach to design and evaluate AI- and XAI-assisted diagnostic tools. To this aim, the paper also proposes a hierarchy of evidence framework to help designers and practitioners choose the XAI solutions that optimize performance and satisfaction on the basis of the strongest evidence available or to focus on the gaps in the literature that need to be filled to move from opinionated and eminence-based research to one more based on empirical evidence and end-user work and preferences.
Published: 2024

12. Never tell me the odds: Investigating pro-hoc explanations in medical decision making

Author: Cabitza, F, Natali, C, Famiglini, L, Campagner, A, Caccavella, V, Gallazzi, E, Cabitza, F, Natali, C, Famiglini, L, Campagner, A, Caccavella, V, and Gallazzi, E
Abstract: This paper examines a kind of explainable AI, centered around what we term pro-hoc explanations, that is a form of support that consists of offering alternative explanations (one for each possible outcome) instead of a specific post-hoc explanation following specific advice. Specifically, our support mechanism utilizes explanations by examples, featuring analogous cases for each category in a binary setting. Pro-hoc explanations are an instance of what we called frictional AI, a general class of decision support aimed at achieving a useful compromise between the increase of decision effectiveness and the mitigation of cognitive risks, such as over-reliance, automation bias and deskilling. To illustrate an instance of frictional AI, we conducted an empirical user study to investigate its impact on the task of radiological detection of vertebral fractures in x-rays. Our study engaged 16 orthopedists in a ‘human-first, second-opinion’ interaction protocol. In this protocol, clinicians first made initial assessments of the x-rays without AI assistance and then provided their final diagnosis after considering the pro-hoc explanations. Our findings indicate that physicians, particularly those with less experience, perceived pro-hoc XAI support as significantly beneficial, even though it did not notably enhance their diagnostic accuracy. However, their increased confidence in final diagnoses suggests a positive overall impact. Given the promisingly high effect size observed, our results advocate for further research into pro-hoc explanations specifically, and into the broader concept of frictional AI.
Published: 2024

13. Evaluation of uncertainty quantification methods in multi-label classification: A case study with automatic diagnosis of electrocardiogram

Author: Barandas, M, Famiglini, L, Campagner, A, Folgado, D, Simao, R, Cabitza, F, Gamboa, H, Barandas, M, Famiglini, L, Campagner, A, Folgado, D, Simao, R, Cabitza, F, and Gamboa, H
Abstract: Artificial Intelligence (AI) use in automated Electrocardiogram (ECG) classification has continuously attracted the research community's interest, motivated by their promising results. Despite their great promise, limited attention has been paid to the robustness of their results, which is a key element for their implementation in clinical practice. Uncertainty Quantification (UQ) is a critical for trustworthy and reliable AI, particularly in safety-critical domains such as medicine. Estimating uncertainty in Machine Learning (ML) model predictions has been extensively used for Out-of-Distribution (OOD) detection under single-label tasks. However, the use of UQ methods in multi-label classification remains underexplored. This study goes beyond developing highly accurate models comparing five uncertainty quantification methods using the same Deep Neural Network (DNN) architecture across various validation scenarios, including internal and external validation as well as OOD detection, taking multi-label ECG classification as the example domain. We show the importance of external validation and its impact on classification performance, uncertainty estimates quality, and calibration. Ensemble-based methods yield more robust uncertainty estimations than single network or stochastic methods. Although current methods still have limitations in accurately quantifying uncertainty, particularly in the case of dataset shift, incorporating uncertainty estimates with a classification with a rejection option improves the ability to detect such changes. Moreover, we show that using uncertainty estimates as a criterion for sample selection in active learning setting results in greater improvements in classification performance compared to random sampling.
Published: 2024

14. Ensemble Predictors: Possibilistic Combination of Conformal Predictors for Multivariate Time Series Classification

Author: Campagner, A, Barandas, M, Folgado, D, Gamboa, H, Cabitza, F, Campagner, Andrea, Barandas, Marília, Folgado, Duarte, Gamboa, Hugo, Cabitza, Federico, Campagner, A, Barandas, M, Folgado, D, Gamboa, H, Cabitza, F, Campagner, Andrea, Barandas, Marília, Folgado, Duarte, Gamboa, Hugo, and Cabitza, Federico
Abstract: In this article we propose a conceptual framework to study ensembles of conformal predictors (CP), that we call Ensemble Predictors (EP). Our approach is inspired by the application of imprecise probabilities in information fusion. Based on the proposed framework, we study, for the first time in the literature, the theoretical properties of CP ensembles in a general setting, by focusing on simple and commonly used possibilistic combination rules. We also illustrate the applicability of the proposed methods in the setting of multivariate time-series classification, showing that these methods provide better performance (in terms of both robustness, conservativeness, accuracy and running time) than both standard classification algorithms and other combination rules proposed in the literature, on a large set of benchmarks from the UCR time series archive.
Published: 2024

15. Preface

Author: Holzinger, A, Kieseberg, P, Cabitza, F, Campagner, A, Tjoa, AM, Weippl, E, Tjoa, A, Holzinger A., Kieseberg P., Cabitza F., Campagner A., Weippl E., Tjoa A. M., Holzinger, A, Kieseberg, P, Cabitza, F, Campagner, A, Tjoa, AM, Weippl, E, Tjoa, A, Holzinger A., Kieseberg P., Cabitza F., Campagner A., Weippl E., and Tjoa A. M.
Published: 2023

16. Biomarkers for Mixed Dementia: a hard bone to bite? Preliminary analyses and promising results for a debated topic

Author: Fracasso, F, Gasparini, F, Milella, F, Campagner, A, Famiglini, L, Arosio, B, Rossi, P, Annoni, G, Cabitza, F, Campagner A., Famiglini L., Arosio B., Rossi P., Annoni G., Cabitza F., Fracasso, F, Gasparini, F, Milella, F, Campagner, A, Famiglini, L, Arosio, B, Rossi, P, Annoni, G, Cabitza, F, Campagner A., Famiglini L., Arosio B., Rossi P., Annoni G., and Cabitza F.
Abstract: Dementia refers to a group of neurodegenerative disorders that impact the cognitive function of an increasing number of individuals. Because of the variety of manifestations, the idea of mixed dementia has recently garnered increased awareness and attention from the scientific community. In this work, we describe a high-quality dataset, as well as the findings of a preliminary analysis devoted to investigating the potential of computational methods that are highly indicative of mixed dementia. We will specifically describe the findings of a phenotypic stratification analysis, based on clustering approaches, that highlights possibly significant aspects of mixed dementia, paving the way for further research devoted to the application of Machine Learning techniques to the robust and early diagnosis of mixed dementia.
Published: 2023

17. Explainability meets uncertainty quantification: Insights from feature-based model fusion on multimodal time series

Author: Folgado, D, Barandas, M, Famiglini, L, Santos, R, Cabitza, F, Gamboa, H, Folgado D., Barandas M., Famiglini L., Santos R., Cabitza F., Gamboa H., Folgado, D, Barandas, M, Famiglini, L, Santos, R, Cabitza, F, Gamboa, H, Folgado D., Barandas M., Famiglini L., Santos R., Cabitza F., and Gamboa H.
Abstract: Feature importance evaluation is one of the prevalent approaches to interpreting Machine Learning (ML) models. A drawback of using these methods for high-dimensional datasets is that they often lead to high-dimensional explanation output that hinders human analysis. This is especially true for explaining multimodal ML models, where the problem's complexity is further exacerbated by the inclusion of multiple data modalities and an increase in the overall number of features. This work proposes a novel approach to lower the complexity of feature-based explanations. The proposed approach is based on uncertainty quantification techniques, allowing for a principled way of reducing the number of modalities required to explain the model's predictions. We evaluated our method in three multimodal datasets comprising physiological time series. Results show that the proposed method can reduce the complexity of the explanations while maintaining a high level of accuracy in the predictions. This study illustrates an innovative example of the intersection between the disciplines of uncertainty quantification and explainable artificial intelligence.
Published: 2023

18. The Impact of Gender and Personality in Human-AI Teaming: The Case of Collaborative Question Answering

Author: Milella, F, Natali, C, Scantamburlo, T, Campagner, A, Cabitza, F, Milella F., Natali C., Scantamburlo T., Campagner A., Cabitza F., Milella, F, Natali, C, Scantamburlo, T, Campagner, A, Cabitza, F, Milella F., Natali C., Scantamburlo T., Campagner A., and Cabitza F.
Abstract: This paper discusses the results of an exploratory study aimed at investigating the impact of conversational agents (CAs) and specifically their agential characteristics on collaborative decision-making processes. The study involved 29 participants divided into 8 small teams engaged in a question-and-answer trivia-style game with the support of a text-based CA, characterized by two independent binary variables: personality (gentle and cooperative vs blunt and uncooperative) and gender (female vs male). A semi-structured group interview was conducted at the end of the experimental sessions to investigate the perceived utility and level of satisfaction with the CAs. Our results show that when users interact with a gentle and cooperative CA, their user satisfaction is higher. Furthermore, female CAs are perceived as more useful and satisfying to interact with than male CAs. We show that group performance improves through interaction with the CAs, confirming that a stereotype favoring the female with a gentle and cooperative personality combination exists in regard to perceived satisfaction, even though this does not lead to greater perceived utility. Our study extends the current debate about the possible correlation between CA characteristics and human acceptance and suggests future research to investigate the role of gender bias and related biases in human-AI teaming.
Published: 2023

19. The Tower of Babel in Explainable Artificial Intelligence (XAI)

Author: Holzinger, A, Kieseberg, P, Cabitza, F, Campagner, A, Tjoa, AM, Weippl, E, Schneeberger, D, Rottger, R, Plass, M, Muller, H, Schneeberger D., Rottger R., Cabitza F., Campagner A., Plass M., Muller H., Holzinger A., Holzinger, A, Kieseberg, P, Cabitza, F, Campagner, A, Tjoa, AM, Weippl, E, Schneeberger, D, Rottger, R, Plass, M, Muller, H, Schneeberger D., Rottger R., Cabitza F., Campagner A., Plass M., Muller H., and Holzinger A.
Abstract: As machine learning (ML) has emerged as the predominant technological paradigm for artificial intelligence (AI), complex black box models such as GPT-4 have gained widespread adoption. Concurrently, explainable AI (XAI) has risen in significance as a counterbalancing force. But the rapid expansion of this research domain has led to a proliferation of terminology and an array of diverse definitions, making it increasingly challenging to maintain coherence. This confusion of languages also stems from the plethora of different perspectives on XAI, e.g. ethics, law, standardization and computer science. This situation threatens to create a “tower of Babel” effect, whereby a multitude of languages impedes the establishment of a common (scientific) ground. In response, this paper first maps different vocabularies, used in ethics, law and standardization. It shows that despite a quest for standardized, uniform XAI definitions, there is still a confusion of languages. Drawing lessons from these viewpoints, it subsequently proposes a methodology for identifying a unified lexicon from a scientific standpoint. This could aid the scientific community in presenting a more unified front to better influence ongoing definition efforts in law and standardization, often without enough scientific representation, which will shape the nature of AI and XAI in the future.
Published: 2023

20. Demo: Decision Support System Quality Assessment Tool

Author: Cabitza, F, Campagner, A, Natali, C, Cabitza F., Campagner A., Natali C., Cabitza, F, Campagner, A, Natali, C, Cabitza F., Campagner A., and Natali C.
Abstract: This demo showcases the DSS Quality Assessment Tool, an online service developed at the MUDI Lab (University of Milano-Bicocca, Milan, Italy) for the holistic assessment of the quality of AI-based decision support systems (DSS), along four quality dimensions: data similarity, calibration, robustness, and human interaction.
Published: 2023

21. Let Me Think! Investigating the Effect of Explanations Feeding Doubts About the AI Advice

Author: Cabitza, F, Campagner, A, Famiglini, L, Natali, C, Caccavella, V, Gallazzi, E, Cabitza F., Campagner A., Famiglini L., Natali C., Caccavella V., Gallazzi E., Cabitza, F, Campagner, A, Famiglini, L, Natali, C, Caccavella, V, Gallazzi, E, Cabitza F., Campagner A., Famiglini L., Natali C., Caccavella V., and Gallazzi E.
Abstract: Augmented Intelligence (AuI) refers to the use of artificial intelligence (AI) to amplify certain cognitive tasks performed by human decision-makers. However, there are concerns that AI’s increasing capability and alignment with human values may undermine user agency, autonomy, and responsible decision-making. To address these concerns, we conducted a user study in the field of orthopedic radiology diagnosis, introducing a reflective XAI (explainable AI) support that aimed to stimulate human reflection, and we evaluated its impact of in terms of decision performance, decision confidence and perceived utility. Specifically, the reflective XAI support system prompted users to reflect on the dependability of AI-generated advice by presenting evidence both in favor of and against its recommendation. This evidence was presented via two cases that closely resembled a given base case, along with pixel attribution maps. These cases were associated with the same AI advice for the base case, but one case was accurate while the other was erroneous with respect to the ground truth. While the introduction of this support system did not significantly enhance diagnostic accuracy, it was highly valued by more experienced users. Based on the findings of this study, we advocate for further research to validate the potential of reflective XAI in fostering more informed and responsible decision-making, ultimately preserving human agency.
Published: 2023

22. Towards a Rigorous Calibration Assessment Framework: Advancements in Metrics, Methods, and Use

Author: Famiglini, L, Campagner, A, Cabitza, F, Famiglini L., Campagner A., Cabitza F., Famiglini, L, Campagner, A, Cabitza, F, Famiglini L., Campagner A., and Cabitza F.
Abstract: Calibration is paramount in developing and validating Machine Learning models, particularly in sensitive domains such as medicine. Despite its significance, existing metrics to assess calibration have been found to have shortcomings in regard to their interpretation and theoretical properties. This article introduces a novel and comprehensive framework to assess the calibration of Machine and Deep Learning models that addresses the above limitations. The proposed framework is based on a modification of the Expected Calibration Error (ECE), called the Estimated Calibration Index (ECI), which grounds on and extends prior research. ECI was initially formulated for binary settings, and we adapted it to fit multiclass settings. ECI offers a more nuanced, both locally and globally, and informative measure of a model's tendency towards over/underconfidence. The paper first outlines the issues related to the prevalent definitions of ECE, including potential biases that may arise in the evaluation of their measures. Then, we present the results of a series of experiments conducted to demonstrate the effectiveness of the proposed framework in supporting a more accurate understanding of a model's calibration level. Additionally, we discuss how to address and potentially mitigate some biases in calibration assessment.
Published: 2023

23. Color Shadows 2: Assessing the Impact of XAI on Diagnostic Decision-Making

Author: Longo, L, Natali, C, Famiglini, L, Campagner, A, La Maida, G, Gallazzi, E, Cabitza, F, Natali C., Famiglini L., Campagner A., La Maida G. A., Gallazzi E., Cabitza F., Longo, L, Natali, C, Famiglini, L, Campagner, A, La Maida, G, Gallazzi, E, Cabitza, F, Natali C., Famiglini L., Campagner A., La Maida G. A., Gallazzi E., and Cabitza F.
Abstract: A comprehensive assessment of the impact of eXplainable AI (XAI) on diagnostic decision-making should adopt a socio-technical perspective. Our study focuses on Decision Support Systems (DSS) that provide explanations in the form of Activation Maps, assessing their impact in terms of automation bias and algorithmic aversion. Specifically, we focus on the XAI-assisted task of detecting thoraco-lumbar fractures from X-rays by radiologists, taking into account the complexity of the cases and the experience level of users. Our results show how XAI support has a clear and positive impact on diagnostic performance. By introducing the concepts of technology impact, reliance patterns, and the white box paradox, we highlight the importance of designing Human-AI Collaboration Protocols (HAI-CP) that are specific to the task at hand to optimize the integration of XAI into diagnostic decision-making.
Published: 2023

24. Controllable AI - An Alternative to Trustworthiness in Complex AI Systems?

Author: Kieseberg, P, Weippl, E, Tjoa, A, Cabitza, F, Campagner, A, Holzinger, A, Kieseberg P., Weippl E., Tjoa A. M., Cabitza F., Campagner A., Holzinger A., Kieseberg, P, Weippl, E, Tjoa, A, Cabitza, F, Campagner, A, Holzinger, A, Kieseberg P., Weippl E., Tjoa A. M., Cabitza F., Campagner A., and Holzinger A.
Abstract: The release of ChatGPT to the general public has sparked discussions about the dangers of artificial intelligence (AI) among the public. The European Commission’s draft of the AI Act has further fueled these discussions, particularly in relation to the definition of AI and the assignment of risk levels to different technologies. Security concerns in AI systems arise from the need to protect against potential adversaries and to safeguard individuals from AI decisions that may harm their well-being. However, ensuring secure and trustworthy AI systems is challenging, especially with deep learning models that lack explainability. This paper proposes the concept of Controllable AI as an alternative to Trustworthy AI and explores the major differences between the two. The aim is to initiate discussions on securing complex AI systems without sacrificing practical capabilities or transparency. The paper provides an overview of techniques that can be employed to achieve Controllable AI. It discusses the background definitions of explainability, Trustworthy AI, and the AI Act. The principles and techniques of Controllable AI are detailed, including detecting and managing control loss, implementing transparent AI decisions, and addressing intentional bias or backdoors. The paper concludes by discussing the potential applications of Controllable AI and its implications for real-world scenarios.
Published: 2023

25. Enhancing human-AI collaboration: The case of colonoscopy

Author: Introzzi, L, Zonca, J, Cabitza, F, Cherubini, P, Reverberi, C, Introzzi L., Zonca J., Cabitza F., Cherubini P., Reverberi C., Introzzi, L, Zonca, J, Cabitza, F, Cherubini, P, Reverberi, C, Introzzi L., Zonca J., Cabitza F., Cherubini P., and Reverberi C.
Abstract: Diagnostic errors impact patient health and healthcare costs. Artificial Intelligence (AI) shows promise in mitigating this burden by supporting Medical Doctors in decision-making. However, the mere display of excellent or even superhuman performance by AI in specific tasks does not guarantee a positive impact on medical practice. Effective AI assistance should target the primary causes of human errors and foster effective collaborative decision-making with human experts who remain the ultimate decision-makers. In this narrative review, we apply these principles to the specific scenario of AI assistance during colonoscopy. By unraveling the neurocognitive foundations of the colonoscopy procedure, we identify multiple bottlenecks in perception, attention, and decision-making that contribute to diagnostic errors, shedding light on potential interventions to mitigate them. Furthermore, we explored how existing AI devices fare in clinical practice and whether they achieved an optimal integration with the human decision-maker. We argue that to foster optimal Human-AI collaboration, future research should expand our knowledge of factors influencing AI's impact, establish evidence-based cognitive models, and develop training programs based on them. These efforts will enhance human-AI collaboration, ultimately improving diagnostic accuracy and patient outcomes. The principles illuminated in this review hold more general value, extending their relevance to a wide array of medical procedures and beyond.
Published: 2023

26. Artificial Intelligence and liver: Opportunities and barriers

Author: Balsano, C, Burra, P, Duvoux, C, Alisi, A, Piscaglia, F, Gerussi, A, Brunetto, M, Bonino, F, Montalti, R, Campanile, S, Persico, M, Alvaro, D, Santini, S, Invernizzi, P, Carbone, M, Masarone, M, Eccher, A, Siciliano, B, Vento, M, Ficuciello, F, Cabitza, F, Penasa, S, Donatelli, P, Balsano C., Burra P., Duvoux C., Alisi A., Piscaglia F., Gerussi A., Brunetto M. R., Bonino F., Montalti R., Campanile S., Persico M., Alvaro D., Santini S., Invernizzi P., Carbone M., Masarone M., Eccher A., Siciliano B., Vento M., Ficuciello F., Cabitza F., Penasa S., Donatelli P., Balsano, C, Burra, P, Duvoux, C, Alisi, A, Piscaglia, F, Gerussi, A, Brunetto, M, Bonino, F, Montalti, R, Campanile, S, Persico, M, Alvaro, D, Santini, S, Invernizzi, P, Carbone, M, Masarone, M, Eccher, A, Siciliano, B, Vento, M, Ficuciello, F, Cabitza, F, Penasa, S, Donatelli, P, Balsano C., Burra P., Duvoux C., Alisi A., Piscaglia F., Gerussi A., Brunetto M. R., Bonino F., Montalti R., Campanile S., Persico M., Alvaro D., Santini S., Invernizzi P., Carbone M., Masarone M., Eccher A., Siciliano B., Vento M., Ficuciello F., Cabitza F., Penasa S., and Donatelli P.
Abstract: Artificial Intelligence (AI) has recently been shown as an excellent tool for the study of the liver; however, many obstacles still have to be overcome for the digitalization of real-world hepatology. The authors present an overview of the current state of the art on the use of innovative technologies in different areas (big data, translational hepatology, imaging, and transplant setting). In clinical practice, physicians must integrate a vast array of data modalities (medical history, clinical data, laboratory tests, imaging, and pathology slides) to achieve a diagnostic or therapeutic decision. Unfortunately, machine learning and deep learning are still far from really supporting clinicians in real life. In fact, the accuracy of any technological support has no value in medicine without the support of clinicians. To make better use of new technologies, it is essential to improve clinicians’ knowledge about them. To this end, the authors propose that collaborative networks for multidisciplinary approaches will improve the rapid implementation of AI systems for developing disease-customized AI-powered clinical decision support tools. The authors also discuss ethical, educational, and legal challenges that must be overcome to build robust bridges and deploy potentially effective AI in real-world clinical settings.
Published: 2023

27. Painting the Black Box White: Experimental Findings from Applying XAI to an ECG Reading Setting

Author: Cabitza, F, Campagner, A, Natali, C, Parimbelli, E, Ronzio, L, Cameli, M, Cabitza F., Campagner A., Natali C., Parimbelli E., Ronzio L., Cameli M., Cabitza, F, Campagner, A, Natali, C, Parimbelli, E, Ronzio, L, Cameli, M, Cabitza F., Campagner A., Natali C., Parimbelli E., Ronzio L., and Cameli M.
Abstract: The emergence of black-box, subsymbolic, and statistical AI systems has motivated a rapid increase in the interest regarding explainable AI (XAI), which encompasses both inherently explainable techniques, as well as approaches to make black-box AI systems explainable to human decision makers. Rather than always making black boxes transparent, these approaches are at risk of painting the black boxes white, thus failing to provide a level of transparency that would increase the system’s usability and comprehensibility, or even at risk of generating new errors (i.e., white-box paradox). To address these usability-related issues, in this work we focus on the cognitive dimension of users’ perception of explanations and XAI systems. We investigated these perceptions in light of their relationship with users’ characteristics (e.g., expertise) through a questionnaire-based user study involved 44 cardiology residents and specialists in an AI-supported ECG reading task. Our results point to the relevance and correlation of the dimensions of trust, perceived quality of explanations, and tendency to defer the decision process to automation (i.e., technology dominance). This contribution calls for the evaluation of AI-based support systems from a human–AI interaction-oriented perspective, laying the ground for further investigation of XAI and its effects on decision making and user experience.
Published: 2023

28. The Effect of Holographic Heart Models and Mixed Reality for Anatomy Learning in Congenital Heart Disease: An Exploratory Study

Author: D'Aiello, A, Cabitza, F, Natali, C, Vigano, S, Ferrero, P, Bognoni, L, Pasqualin, G, Giamberti, A, Chessa, M, d'Aiello A. F., Cabitza F., Natali C., Vigano S., Ferrero P., Bognoni L., Pasqualin G., Giamberti A., Chessa M., D'Aiello, A, Cabitza, F, Natali, C, Vigano, S, Ferrero, P, Bognoni, L, Pasqualin, G, Giamberti, A, Chessa, M, d'Aiello A. F., Cabitza F., Natali C., Vigano S., Ferrero P., Bognoni L., Pasqualin G., Giamberti A., and Chessa M.
Abstract: In this paper, we present an exploratory study on the potential impact of holographic heart models and mixed reality technology on medical training, and in particular in teaching complex Congenital Heart Diseases (CHD) to medical students. Fifty-nine medical students were randomly allocated into three groups. Each participant in each group received a 30-minute lecture on a CHD condition interpretation and transcatheter treatment with different instructional tools. The participants of the first group attended a lecture in which traditional slides were projected onto a flat screen (group “regular slideware”, RS). The second group was shown slides incorporating videos of holographic anatomical models (group “holographic videos”, HV). Finally, those in the third group wore immersive, head-mounted devices (HMD) to interact directly with holographic anatomical models (group “mixed reality”, MR). At the end of the lecture, the members of each group were asked to fill in a multiple-choice questionnaire aimed at evaluating their topic proficiency, as a proxy to evaluate the effectiveness of the training session (in terms of acquired notions); participants from group MR were also asked to fill in a questionnaire regarding the recommendability and usability of the MS Hololens HMDs, as a proxy of satisfaction regarding its use experience (UX). The findings show promising results for usability and user acceptance.
Published: 2023

29. Assessment of Fast-Track Pathway in Hip and Knee Replacement Surgery by Propensity Score Matching on Patient-Reported Outcomes

Author: Campagner, A, Milella, F, Guida, S, Bernareggi, S, Banfi, G, Cabitza, F, Campagner A., Milella F., Guida S., Bernareggi S., Banfi G., Cabitza F., Campagner, A, Milella, F, Guida, S, Bernareggi, S, Banfi, G, Cabitza, F, Campagner A., Milella F., Guida S., Bernareggi S., Banfi G., and Cabitza F.
Abstract: Total hip (THA) and total knee (TKA) arthroplasty procedures have steadily increased over the past few decades, and their use is expected to grow further, mainly due to an increasing number of elderly patients. Cost-containment strategies, supporting a rapid recovery with a positive functional outcomes, high patient satisfaction, and enhanced patient reported outcomes, are needed. A Fast Track surgical procedure (FT) is a coordinated perioperative approach aimed at expediting early mobilization and recovery following surgery and, accordingly, shortening the length of hospital stay (LOS), convalescence and costs. In this view, rapid rehabilitation surgery optimizes traditional rehabilitation methods by integrating evidence-based practices into the procedure. The aim of the present study was to compare the effectiveness of Fast Track versus Care-as-Usual surgical procedures and pathways (including rehabilitation) on a mid-term patient-reported outcome (PROs), the SF12 (with regard both to Physical and Mental Scores), 3 months after hip or knee replacement surgery, with the use of Propensity score-matching (PSM) analysis to address the issue of the comparability of the groups in a non-randomized study. We were interested in the evaluation of the entire pathways, including the postoperative rehabilitation stage, therefore, we only used early home discharge as a surrogate to differentiate between the Fast Track and Care-as-Usual rehabilitation pathways. Our study shows that the entire Fast Track pathway, which includes the post-operative rehabilitation stage, has a significantly positive impact on physical health-related status (SF12 Physical Scores), as perceived by patients 3 months after hip or knee replacement surgery, as opposed to the standardized program, both in terms of the PROs score and the relative improvements observed, as compared with the minimum clinically important difference. This result encourages additional research into the effects of Fast Track reha
Published: 2023

30. Rams, hounds and white boxes: Investigating human–AI collaboration protocols in medical diagnosis

Author: Cabitza, F, Campagner, A, Ronzio, L, Cameli, M, Mandoli, G, Pastore, M, Sconfienza, L, Folgado, D, Barandas, M, Gamboa, H, Cabitza F., Campagner A., Ronzio L., Cameli M., Mandoli G. E., Pastore M. C., Sconfienza L. M., Folgado D., Barandas M., Gamboa H., Cabitza, F, Campagner, A, Ronzio, L, Cameli, M, Mandoli, G, Pastore, M, Sconfienza, L, Folgado, D, Barandas, M, Gamboa, H, Cabitza F., Campagner A., Ronzio L., Cameli M., Mandoli G. E., Pastore M. C., Sconfienza L. M., Folgado D., Barandas M., and Gamboa H.
Abstract: In this paper, we study human–AI collaboration protocols, a design-oriented construct aimed at establishing and evaluating how humans and AI can collaborate in cognitive tasks. We applied this construct in two user studies involving 12 specialist radiologists (the knee MRI study) and 44 ECG readers of varying expertise (the ECG study), who evaluated 240 and 20 cases, respectively, in different collaboration configurations. We confirm the utility of AI support but find that XAI can be associated with a “white-box paradox”, producing a null or detrimental effect. We also find that the order of presentation matters: AI-first protocols are associated with higher diagnostic accuracy than human-first protocols, and with higher accuracy than both humans and AI alone. Our findings identify the best conditions for AI to augment human diagnostic skills, rather than trigger dysfunctional responses and cognitive biases that can undermine decision effectiveness.
Published: 2023

31. Aggregation models in ensemble learning: A large-scale comparison

Author: Campagner, A, Ciucci, D, Cabitza, F, Campagner A., Ciucci D., Cabitza F., Campagner, A, Ciucci, D, Cabitza, F, Campagner A., Ciucci D., and Cabitza F.
Abstract: In this work we present a large-scale comparison of 21 learning and aggregation methods proposed in the ensemble learning, social choice theory (SCT), information fusion and uncertainty management (IF-UM) and collective intelligence (CI) fields, based on a large collection of 40 benchmark datasets. The results of this comparison show that Bagging-based approaches reported performances comparable with XGBoost, and significantly outperformed other Boosting methods. In particular, ExtraTree-based approaches were as accurate as both XGBoost and Decision Tree-based ones while also being more computationally efficient. We also show how standard Bagging-based and IF-UM-inspired approaches outperformed the approaches based on CI and SCT. IF-UM-inspired approaches, in particular, reported the best performance (together with standard ExtraTrees), as well as the strongest resistance to label noise (together with XGBoost). Based on our results, we provide useful indications on the practical effectiveness of different state-of-the-art ensemble and aggregation methods in general settings.
Published: 2023

32. Quod erat demonstrandum? - Towards a typology of the concept of explanation for the design of explainable AI

Author: Cabitza, F, Campagner, A, Malgieri, G, Natali, C, Schneeberger, D, Stoeger, K, Holzinger, A, Cabitza F., Campagner A., Malgieri G., Natali C., Schneeberger D., Stoeger K., Holzinger A., Cabitza, F, Campagner, A, Malgieri, G, Natali, C, Schneeberger, D, Stoeger, K, Holzinger, A, Cabitza F., Campagner A., Malgieri G., Natali C., Schneeberger D., Stoeger K., and Holzinger A.
Abstract: In this paper, we present a fundamental framework for defining different types of explanations of AI systems and the criteria for evaluating their quality. Starting from a structural view of how explanations can be constructed, i.e., in terms of an explanandum (what needs to be explained), multiple explanantia (explanations, clues, or parts of information that explain), and a relationship linking explanandum and explanantia, we propose an explanandum-based typology and point to other possible typologies based on how explanantia are presented and how they relate to explanandia. We also highlight two broad and complementary perspectives for defining possible quality criteria for assessing explainability: epistemological and psychological (cognitive). These definition attempts aim to support the three main functions that we believe should attract the interest and further research of XAI scholars: clear inventories, clear verification criteria, and clear validation methods.
Published: 2023

33. The Effect of Holographic Heart Models and Mixed Reality for Anatomy Learning in Congenital Heart Disease: An Exploratory Study

Author: d'Aiello A. F., Cabitza F., Natali C., Vigano S., Ferrero P., Bognoni L., Pasqualin G., Giamberti A., Chessa M., D'Aiello, A, Cabitza, F, Natali, C, Vigano, S, Ferrero, P, Bognoni, L, Pasqualin, G, Giamberti, A, and Chessa, M
Subjects: HoloLen, Medical education, Holographic image, Health Information Management, Medicine (miscellaneous), Health Informatics, Augmented reality, Mixed reality, Congenital heart disease, Information Systems
Abstract: In this paper, we present an exploratory study on the potential impact of holographic heart models and mixed reality technology on medical training, and in particular in teaching complex Congenital Heart Diseases (CHD) to medical students. Fifty-nine medical students were randomly allocated into three groups. Each participant in each group received a 30-minute lecture on a CHD condition interpretation and transcatheter treatment with different instructional tools. The participants of the first group attended a lecture in which traditional slides were projected onto a flat screen (group “regular slideware”, RS). The second group was shown slides incorporating videos of holographic anatomical models (group “holographic videos”, HV). Finally, those in the third group wore immersive, head-mounted devices (HMD) to interact directly with holographic anatomical models (group “augmented reality”, AR). At the end of the lecture, the members of each group were asked to fill in a multiple-choice questionnaire aimed at evaluating their topic proficiency, as a proxy to evaluate the effectiveness of the training session (in terms of acquired notions); participants from group AR were also asked to fill in a questionnaire regarding the recommendability and usability of the MS Hololens HMDs, as a proxy of satisfaction regarding its use experience (UX). The findings show promising results for usability and user acceptance.
Published: 2023
Full Text: View/download PDF

34. Rams, hounds and white boxes: Investigating human-AI collaboration protocols in medical diagnosis

Author: Cabitza, F., Campagner, A., Ronzio, L., Cameli, M., Mandoli, G.E., Pastore, M.C., Sconfienza, L.M., Folgado, D., Barandas, M., Gamboa, H., Cabitza, F, Campagner, A, Ronzio, L, Cameli, M, Mandoli, G, Pastore, M, Sconfienza, L, Folgado, D, Barandas, M, and Gamboa, H
Subjects: Cognitive biases, Artificial intelligence, Artificial intelligence, Automation bias, Cognitive biases, Explainable AI, Human–AI collaboration protocols, Settore MED/36 - Diagnostica per Immagini e Radioterapia, Settore INF/01 - Informatica, Automation bias, Explainable AI, Human–AI collaboration protocols, Automation bia, Human–AI collaboration protocol, Cognitive biase, Medicine (miscellaneous)
Abstract: In this paper, we study human–AI collaboration protocols, a design-oriented construct aimed at establishing and evaluating how humans and AI can collaborate in cognitive tasks. We applied this construct in two user studies involving 12 specialist radiologists (the knee MRI study) and 44 ECG readers of varying expertise (the ECG study), who evaluated 240 and 20 cases, respectively, in different collaboration configurations. We confirm the utility of AI support but find that XAI can be associated with a “white-box paradox”, producing a null or detrimental effect. We also find that the order of presentation matters: AI-first protocols are associated with higher diagnostic accuracy than human-first protocols, and with higher accuracy than both humans and AI alone. Our findings identify the best conditions for AI to augment human diagnostic skills, rather than trigger dysfunctional responses and cognitive biases that can undermine decision effectiveness.
Published: 2023

35. Comparative Assessment of Two Data Visualizations to Communicate Medical Test Results Online

Author: Hurter, C, Purchase, HC, Bouatouch, K, Cabitza, F, Campagner, A, Conte, E, Cabitza F., Campagner A., Conte E., Hurter, C, Purchase, HC, Bouatouch, K, Cabitza, F, Campagner, A, Conte, E, Cabitza F., Campagner A., and Conte E.
Abstract: As most countries in the world still struggle to contain the COVID-19 breakout, Data Visualization tools have become increasingly important to support decision-making under uncertain conditions. One of the challenges posed by the pandemic is the early diagnosis of COVID-19: To this end, machine learning models capable of detecting COVID-19 on the basis of hematological values have been developed and validated. This study aims to evaluate the potential of two Data Visualizations to effectively present the output of a COVID-19 diagnostic model to render it online. Specifically, we investigated whether any visualization is better than the other in communicating a COVID-19 test results in an effective and clear manner, both with respect to positivity and to the reliability of the test itself. The findings suggest that designing a visual tool for the general public in this application domain can be extremely challenging for the need to render a wide array of outcomes that can be affected by varying levels of uncertainty.
Published: 2022

36. Comparing Handcrafted Features and Deep Neural Representations for Domain Generalization in Human Activity Recognition

Author: Bento, N, Rebelo, J, Barandas, M, Carreiro, A, Campagner, A, Cabitza, F, Gamboa, H, Bento N., Rebelo J., Barandas M., Carreiro A. V., Campagner A., Cabitza F., Gamboa H., Bento, N, Rebelo, J, Barandas, M, Carreiro, A, Campagner, A, Cabitza, F, Gamboa, H, Bento N., Rebelo J., Barandas M., Carreiro A. V., Campagner A., Cabitza F., and Gamboa H.
Abstract: Human Activity Recognition (HAR) has been studied extensively, yet current approaches are not capable of generalizing across different domains (i.e., subjects, devices, or datasets) with acceptable performance. This lack of generalization hinders the applicability of these models in real-world environments. As deep neural networks are becoming increasingly popular in recent work, there is a need for an explicit comparison between handcrafted and deep representations in Out-of-Distribution (OOD) settings. This paper compares both approaches in multiple domains using homogenized public datasets. First, we compare several metrics to validate three different OOD settings. In our main experiments, we then verify that even though deep learning initially outperforms models with handcrafted features, the situation is reversed as the distance from the training distribution increases. These findings support the hypothesis that handcrafted features may generalize better across specific domains.
Published: 2022

37. Re-calibrating Machine Learning Models Using Confidence Interval Bounds

Author: Campagner, A, Famiglini, L, Cabitza, F, Campagner A., Famiglini L., Cabitza F., Campagner, A, Famiglini, L, Cabitza, F, Campagner A., Famiglini L., and Cabitza F.
Abstract: In this article we propose a novel technique for the re-calibration of Machine Learning (ML) models. This technique is based on the computation of confidence intervals for the probability scores provided by any ML model. Compared to existing and commonly used calibration methods, the proposed approach has two important advantages: first, under weak assumptions it provides theoretical guarantees about calibration; second, this method does not require any further data other than the training set used for ML model development. We illustrate the effectiveness of the proposed approach on a benchmark dataset for COVID-19 diagnosis, by comparing the proposed method against commonly used re-calibration techniques.
Published: 2022

38. Color Shadows (Part I): Exploratory Usability Evaluation of Activation Maps in Radiological Machine Learning

Author: Holzinger, A, Kieseberg, P, Min Tjoa, A, Weippl, E, Cabitza, F, Campagner, A, Famiglini, L, Gallazzi, E, La Maida, G, Cabitza F., Campagner A., Famiglini L., Gallazzi E., La Maida G. A., Holzinger, A, Kieseberg, P, Min Tjoa, A, Weippl, E, Cabitza, F, Campagner, A, Famiglini, L, Gallazzi, E, La Maida, G, Cabitza F., Campagner A., Famiglini L., Gallazzi E., and La Maida G. A.
Abstract: Although deep learning-based AI systems for diagnostic imaging tasks have virtually showed superhuman accuracy, their use in medical settings has been questioned due to their “black box”, not interpretable nature. To address this shortcoming, several methods have been proposed to make AI eXplainable (XAI), including Pixel Attribution Methods; however, it is still unclear whether these methods are actually effective in “opening” the black-box and improving diagnosis, particularly in tasks where pathological conditions are difficult to detect. In this study, we focus on the detection of thoraco-lumbar fractures from X-rays with the goal of assessing the impact of PAMs on diagnostic decision making by addressing two separate research questions: first, whether activation maps (as an instance of PAM) were perceived as useful in the aforementioned task; and, second, whether maps were also capable to reduce the diagnostic error rate. We show that, even though AMs were not considered significantly useful by physicians, the image readers found high value in the maps in relation to other perceptual dimensions (i.e., pertinency, coherence) and, most importantly, their accuracy significantly improved when given XAI support in a pilot study involving 7 doctors in the interpretation of a small, but carefully chosen, set of images.
Published: 2022

39. Global Interpretable Calibration Index, a New Metric to Estimate Machine Learning Models’ Calibration

Author: Holzinger, A, Kieseberg, P, Tjoa, AM, Weippl, E, Cabitza, F, Campagner, A, Famiglini, L, Cabitza F., Campagner A., Famiglini L., Holzinger, A, Kieseberg, P, Tjoa, AM, Weippl, E, Cabitza, F, Campagner, A, Famiglini, L, Cabitza F., Campagner A., and Famiglini L.
Abstract: The concept of calibration is key in the development and validation of Machine Learning models, especially in sensitive contexts such as the medical one. However, existing calibration metrics can be difficult to interpret and are affected by theoretical limitations. In this paper, we present a new metric, called GICI (Global Interpretable Calibration Index), which is characterized by being local and defined only in terms of simple geometrical primitives, which makes it both simpler to interpret, and more general than other commonly used metrics, as it can be used also in recalibration procedures. Also, compared to traditional metrics, the GICI allows for a more comprehensive evaluation, as it provides a three-level information: a bin-level local estimate, a global one, and an estimate of the extent confidence scores are either over- or under-confident with respect to actual error rate. We also report the results from experiments aimed at testing the above statements and giving insights about the practical utility of this metric also to improve discriminative accuracy.
Published: 2022

40. A Confidence Interval-Based Method for Classifier Re-Calibration

Author: Campagner, A, Famiglini, L, Cabitza, F, Campagner A., Famiglini L., Cabitza F., Campagner, A, Famiglini, L, Cabitza, F, Campagner A., Famiglini L., and Cabitza F.
Abstract: We propose a re-calibration method for Machine Learning models, based on computing confidence intervals for the predicted confidence scores. We show the effectiveness of the proposed method on a COVID-19 diagnosis benchmark.
Published: 2022

41. Decisions are not all equal—Introducing a utility metric based on case-wise raters’ perceptions

Author: Campagner, A, Sternini, F, Cabitza, F, Campagner A., Sternini F., Cabitza F., Campagner, A, Sternini, F, Cabitza, F, Campagner A., Sternini F., and Cabitza F.
Abstract: Background and Objective Evaluation of AI-based decision support systems (AI-DSS) is of critical importance in practical applications, nonetheless common evaluation metrics fail to properly consider relevant and contextual information. In this article we discuss a novel utility metric, the weighted Utility (wU), for the evaluation of AI-DSS, which is based on the raters’ perceptions of their annotation hesitation and of the relevance of the training cases. Methods We discuss the relationship between the proposed metric and other previous proposals; and we describe the application of the proposed metric for both model evaluation and optimization, through three realistic case studies. Results We show that our metric generalizes the well-known Net Benefit, as well as other common error-based and utility-based metrics. Through the empirical studies, we show that our metric can provide a more flexible tool for the evaluation of AI models. We also show that, compared to other optimization metrics, model optimization based on the wU can provide significantly better performance (AUC 0.862 vs 0.895, p-value <0.05), especially on cases judged to be more complex by the human annotators (AUC 0.85 vs 0.92, p-value <0.05). Conclusions We make the point for having utility as a primary concern in the evaluation and optimization of machine learning models in critical domains, like the medical one; and for the importance of a human-centred approach to assess the potential impact of AI models on human decision making also on the basis of further information that can be collected during the ground-truthing process.
Published: 2022

42. The multicenter European Biological Variation Study (EuBIVAS): A new glance provided by the Principal Component Analysis (PCA), a machine learning unsupervised algorithms, based on the basic metabolic panel linked measurands

Author: Carobene, A, Campagner, A, Uccheddu, C, Banfi, G, Vidali, M, Cabitza, F, Carobene A., Campagner A., Uccheddu C., Banfi G., Vidali M., Cabitza F., Carobene, A, Campagner, A, Uccheddu, C, Banfi, G, Vidali, M, Cabitza, F, Carobene A., Campagner A., Uccheddu C., Banfi G., Vidali M., and Cabitza F.
Abstract: The European Biological Variation Study (EuBIVAS), which includes 91 healthy volunteers from five European countries, estimated high-quality biological variation (BV) data for several measurands. Previous EuBIVAS papers reported no significant differences among laboratories/population; however, they were focused on specific set of measurands, without a comprehensive general look. The aim of this paper is to evaluate the homogeneity of EuBIVAS data considering multivariate information applying the Principal Component Analysis (PCA), a machine learning unsupervised algorithm. The EuBIVAS data for 13 basic metabolic panel linked measurands (glucose, albumin, total protein, electrolytes, urea, total bilirubin, creatinine, phosphatase alkaline, aminotransferases), age, sex, menopause, body mass index (BMI), country, alcohol, smoking habits, and physical activity, have been used to generate three databases developed using the traditional univariate and the multivariate Elliptic Envelope approaches to detect outliers, and different missing-value imputations. Two matrix of data for each database, reporting both mean values, and "within-person BV"(CVP) values for any measurand/subject, were analyzed using PCA. A clear clustering between males and females mean values has been identified, where the menopausal females are closer to the males. Data interpretations for the three databases are similar. No significant differences for both mean and CVPs values, for countries, alcohol, smoking habits, BMI and physical activity, have been found. The absence of meaningful differences among countries confirms the EuBIVAS sample homogeneity and that the obtained data are widely applicable to deliver APS. Our data suggest that the use of PCA and the multivariate approach may be used to detect outliers, although further studies are required.
Published: 2022

43. AI Shall Have No Dominion: on How to Measure Technology Dominance in AI-supported Human decision-making

Author: Schmidt, A, Väänänen, K, Cabitza, F, Campagner, A, Angius, R, Natali, C, Reverberi, F, Cabitza, Federico, Campagner, Andrea, Angius, Riccardo, Natali, Chiara, Reverberi, Franco, Schmidt, A, Väänänen, K, Cabitza, F, Campagner, A, Angius, R, Natali, C, Reverberi, F, Cabitza, Federico, Campagner, Andrea, Angius, Riccardo, Natali, Chiara, and Reverberi, Franco
Abstract: In this article, we propose a conceptual and methodological framework for measuring the impact of the introduction of AI systems in decision settings, based on the concept of technological dominance, i.e. the influence that an AI system can exert on human judgment and decisions. We distinguish between a negative component of dominance (automation bias) and a positive one (algorithm appreciation) by focusing on and systematizing the patterns of interaction between human judgment and AI support, or reliance patterns, and their associated cognitive effects. We then define statistical approaches for measuring these dimensions of dominance, as well as corresponding qualitative visualizations. By reporting about four medical case studies, we illustrate how the proposed methods can be used to inform assessments of dominance and of related cognitive biases in real-world settings. Our study lays the groundwork for future investigations into the effects of introducing AI support into naturalistic and collaborative decision-making.
Published: 2023

44. Where is laboratory medicine headed in the next decade? Partnership model for efficient integration and adoption of artificial intelligence into medical laboratories

Author: Carobene, A, Cabitza, F, Bernardini, S, Gopalan, R, Lennerz, J, Weir, C, Cadamuro, J, Carobene, Anna, Cabitza, Federico, Bernardini, Sergio, Gopalan, Raj, Lennerz, Jochen K., Weir, Clare, Cadamuro, Janne, Carobene, A, Cabitza, F, Bernardini, S, Gopalan, R, Lennerz, J, Weir, C, Cadamuro, J, Carobene, Anna, Cabitza, Federico, Bernardini, Sergio, Gopalan, Raj, Lennerz, Jochen K., Weir, Clare, and Cadamuro, Janne
Abstract: Objectives The field of artificial intelligence (AI) has grown in the past 10 years. Despite the crucial role of laboratory diagnostics in clinical decision-making, we found that the majority of AI studies focus on surgery, radiology, and oncology, and there is little attention given to AI integration into laboratory medicine. Methods We dedicated a session at the 3rd annual European Federation of Clinical Chemistry and Laboratory Medicine (EFLM) strategic conference in 2022 to the topic of AI in the laboratory of the future. The speakers collaborated on generating a concise summary of the content that is presented in this paper. Results The five key messages are (1) Laboratory specialists and technicians will continue to improve the analytical portfolio, diagnostic quality and laboratory turnaround times; (2) The modularized nature of laboratory processes is amenable to AI solutions; (3) Laboratory sub-specialization continues and from test selection to interpretation, tasks increase in complexity; (4) Expertise in AI implementation and partnerships with industry will emerge as a professional competency and require novel educational strategies for broad implementation; and (5) regulatory frameworks and guidances have to be adopted to new computational paradigms. Conclusions In summary, the speakers opine that the ability to convert the value-proposition of AI in the laboratory will rely heavily on hands-on expertise and well designed quality improvement initiative from within laboratory for improved patient care.
Published: 2023

45. Potentials and pitfalls of ChatGPT and natural-language artificial intelligence models for the understanding of laboratory medicine test results. An assessment by the European Federation of Clinical Chemistry and Laboratory Medicine (EFLM) Working Group on Artificial Intelligence (WG-AI)

Author: Cadamuro, J, Cabitza, F, Debeljak, Z, De Bruyne, S, Frans, G, Perez, S, Ozdemir, H, Tolios, A, Carobene, A, Padoan, A, Cadamuro, Janne, Cabitza, Federico, Debeljak, Zeljko, De Bruyne, Sander, Frans, Glynis, Perez, Salomon Martin, Ozdemir, Habib, Tolios, Alexander, Carobene, Anna, Padoan, Andrea, Cadamuro, J, Cabitza, F, Debeljak, Z, De Bruyne, S, Frans, G, Perez, S, Ozdemir, H, Tolios, A, Carobene, A, Padoan, A, Cadamuro, Janne, Cabitza, Federico, Debeljak, Zeljko, De Bruyne, Sander, Frans, Glynis, Perez, Salomon Martin, Ozdemir, Habib, Tolios, Alexander, Carobene, Anna, and Padoan, Andrea
Abstract: Objectives: ChatGPT, a tool based on natural language processing (NLP), is on everyone's mind, and several potential applications in healthcare have been already proposed. However, since the ability of this tool to interpret laboratory test results has not yet been tested, the EFLM Working group on Artificial Intelligence (WG-AI) has set itself the task of closing this gap with a systematic approach.Methods: WG-AI members generated 10 simulated laboratory reports of common parameters, which were then passed to ChatGPT for interpretation, according to reference intervals (RI) and units, using an optimized prompt. The results were subsequently evaluated independently by all WG-AI members with respect to relevance, correctness, helpfulness and safety.Results: ChatGPT recognized all laboratory tests, it could detect if they deviated from the RI and gave a test-by-test as well as an overall interpretation. The interpretations were rather superficial, not always correct, and, only in some cases, judged coherently. The magnitude of the deviation from the RI seldom plays a role in the interpretation of laboratory tests, and artificial intelligence (AI) did not make any meaningful suggestion regarding follow-up diagnostics or further procedures in general.Conclusions: ChatGPT in its current form, being not specifically trained on medical data or laboratory data in particular, may only be considered a tool capable of interpreting a laboratory report on a test-by-test basis at best, but not on the interpretation of an overall diagnostic picture. Future generations of similar AIs with medical ground truth training data might surely revolutionize current processes in healthcare, despite this implementation is not ready yet.
Published: 2023

46. Editorial: Clinical Integration of Artificial Intelligence in Spine Surgery: Stepping in a new Frontier

Author: Gallazzi, E, La Maida, G, Cabitza, F, Gallazzi, Enrico, La Maida, Giovanni Andrea, Cabitza, Federico, Gallazzi, E, La Maida, G, Cabitza, F, Gallazzi, Enrico, La Maida, Giovanni Andrea, and Cabitza, Federico
Published: 2023

47. The HIBAD Experience: Using Digital Health Technologies in the GDPR Era

Author: Ferri, A, Agrati, S, Cabitza, F, Colombo, R, Filetti, S, Galeone, C, Lettieri, E, Mariani, P, Nobile, M, Pattini, L, Sfreddo, E, Molteni, M, Ferri, Alessandro, Agrati, Simone, Cabitza, Federico, Colombo, Riccardo, Filetti, Sebastiano, Galeone, Carlotta, Lettieri, Emanuele, Mariani, Paolo, Nobile, Maria, Pattini, Linda, Sfreddo, Eleonora, Molteni, Massimo, Ferri, A, Agrati, S, Cabitza, F, Colombo, R, Filetti, S, Galeone, C, Lettieri, E, Mariani, P, Nobile, M, Pattini, L, Sfreddo, E, Molteni, M, Ferri, Alessandro, Agrati, Simone, Cabitza, Federico, Colombo, Riccardo, Filetti, Sebastiano, Galeone, Carlotta, Lettieri, Emanuele, Mariani, Paolo, Nobile, Maria, Pattini, Linda, Sfreddo, Eleonora, and Molteni, Massimo
Abstract: Conducting clinical research presents multiple challenges in data privacy, safeguarding patients’ rights and freedom, and data and sample management. These challenges intensify when the research involves genetic information, biobanks or big health database and, more recently, digital technologies, such as machine learning (ML) and artificial intelligence (AI). This article discusses how some of these issues will be addressed in “HUB Regionale Integrato Biobanca Analisi Dati e Utilizzo Sperimentale” (HIBAD), a project that aims to create an infrastructure to support different clinical research projects through digital health tools. The core of the HIBAD project will be a Regional Biological Resource Center (CRRB), consisting of a biobank and an integrated database of shared clinical records, to combine information from multiple sources, such as electronic clinical records, images, genomics, wearable devices, biological samples. This massive collection of data and samples will be used as a basis for real-world evidence studies, in-silico clinical trials, new clinical trials, and further classification of data through AI and ML techniques. The project also foresees a dedicated digital infrastructure consisting of standardized protocols, procedures, global informed consent and ethics support to ensure respect for ethics, and privacy and regulatory requirements, which would potentially help the creation of high-quality data for valuable research studies. Public Interest Summary Clinical research is extremely important for discovering new treatments for diseases but can be rather complex, especially considering the existence of multiple rules to follow. The project “HUB Regionale Integrato Biobanca Analisi Dati e Utilizzo Sperimentale” (HIBAD) aims to simplify some of the challenges of modern clinical research and to help researchers conducting clinical trials. First of all, we aim to create a big database where data coming from different sources (e.g. patient's health re
Published: 2023

48. Everything is varied: The surprising impact of instantial variation on ML reliability

Author: Campagner, A, Famiglini, L, Carobene, A, Cabitza, F, Campagner, Andrea, Famiglini, Lorenzo, Carobene, Anna, Cabitza, Federico, Campagner, A, Famiglini, L, Carobene, A, Cabitza, F, Campagner, Andrea, Famiglini, Lorenzo, Carobene, Anna, and Cabitza, Federico
Abstract: Instantial variation (IV) refers to variation that is due not to population differences or errors, but rather to within-subject variation, that is the intrinsic and characteristic patterns of variation pertaining to a given instance or the measurement process. Although taking into account IV is critical for the proper analysis of the results, this source of uncertainty and its impact on robustness have so far been neglected in Machine Learning (ML). To fill this gap, we look at how IV affects ML performance and generalization, and how its impact can be mitigated. Specifically, we provide a methodological contribution to formalize the problem of IV in the statistical learning framework. To prove the relevance of our contribution, we focus on one of the most critical domains, healthcare, and take individual (analytical and biological) variation as a specific kind of IV; in this domain, we use one of the largest real-world laboratory medicine datasets for the task of COVID-19 detection, to show that: (1) common state-of-the-art ML models are severely impacted by the presence of IV in data; and (2) advanced learning strategies, based on data augmentation and soft computing methods (data imprecisiation), and proper study designs can be effective at improving robustness to IV. Our findings demonstrate the critical relevance of correctly accounting for IV to enable safe deployment of ML in real-world settings.
Published: 2023

49. Toward a Perspectivist Turn in Ground Truthing for Predictive Computing

Author: Williams, B, Chen, Y, Neville, J, Cabitza, F, Campagner, A, Basile, V, Williams, B, Chen, Y, Neville, J, Cabitza, F, Campagner, A, and Basile, V
Published: 2023

50. Care and Enterprise Systems: An Archeology of Case Management

Author: Cabitza, F., Viscusi, G., D'Atri, Alessandro, editor, Ferrara, Maria, editor, George, Joey F., editor, and Spagnoletti, Paolo, editor
Published: 2011
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

573 results on '"Cabitza, F."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources