1. Towards Better Ways to Assess Predictive Computing in Medicine: On Reliability, Robustness, and Utility
- Author
-
Carpentieri, B, Lecca, P, Cabitza, F, Campagner, A, Cabitza F., Campagner A., Carpentieri, B, Lecca, P, Cabitza, F, Campagner, A, Cabitza F., and Campagner A.
- Abstract
Computational classification systems built using machine learning (ML) techniques are increasingly being evaluated and employed in medical settings for a number of purposes and applications, including diagnosis, prognosis, and risk stratification. However, evaluation and validation practices that are commonly used and adopted in the application of ML to other disciplines are unlikely to be meaningfully applicable to medicine. In fact, otherwise, technically sound systems have been found to perform poorly in real settings, a concept that has been termed the “last mile of implementation.” In this chapter, we will focus on three main factors underlying the so-called last mile: the impact of observer variability on ground truth reliability; the meaningful and appropriateness of commonly adopted performance measures; and the issue of replicability in ML studies. We will discuss the above mentioned issues, and we will delineate possible solutions and concepts to address them.
- Published
- 2024