1. Optimising criteria for manual smear review following automated blood count analysis: A machine learning approach
- Author
-
Marta Avalos, Hélène Touchais, Marcela Henríquez-Henríquez, Université de Bordeaux (UB), Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Institut National de Recherche en Informatique et en Automatique (Inria), Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées, IntegraMedica, British United Provident Association Chile (BUPA Chile), Ajith Abraham, Hide Sasaki, Ricardo Rios, Niketa Gandhi, Umang Singh, Kun Ma, Avalos, Marta, Université de Toulouse (UT), Hideyasu Sasaki, and Ajith Abraham, Hideyasu Sasaki, Ricardo Rios, Niketa Gandhi, Umang Singh, Kun Ma
- Subjects
[SDV.MHEP.HEM] Life Sciences [q-bio]/Human health and pathology/Hematology ,Decision support system ,Computer science ,Stability (learning theory) ,Feature selection ,Context (language use) ,Logistic regression ,Machine learning ,computer.software_genre ,01 natural sciences ,[STAT.CO] Statistics [stat]/Computation [stat.CO] ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,[STAT.AP] Statistics [stat]/Applications [stat.AP] ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,Interpretability ,0101 mathematics ,[STAT.CO]Statistics [stat]/Computation [stat.CO] ,Data mining ,Selection (genetic algorithm) ,[STAT.AP]Statistics [stat]/Applications [stat.AP] ,[STAT.ME] Statistics [stat]/Methodology [stat.ME] ,Population Health ,business.industry ,Model selection ,Machine Learning for Healthcare Applications ,[SDV.MHEP.HEM]Life Sciences [q-bio]/Human health and pathology/Hematology ,[INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG] ,[STAT.ML] Statistics [stat]/Machine Learning [stat.ML] ,GAM ,Imbalance ,[SDV.SPEE] Life Sciences [q-bio]/Santé publique et épidémiologie ,Categorisation of continuous variables ,030220 oncology & carcinogenesis ,[SDV.SPEE]Life Sciences [q-bio]/Santé publique et épidémiologie ,Artificial intelligence ,Lasso ,business ,computer ,[STAT.ME]Statistics [stat]/Methodology [stat.ME] - Abstract
International audience; The complete blood count (CBC) performed by automated haematology analysers is the most common clinical procedure in the world. Used for health checkup, diagnosis and patient follow-up, the CBC impacts the majority of medical decisions. If the analysis does not fit an expected setting, the laboratory staff manually reviews a blood smear, which is highly time-consuming. Criteria for reviewing CBCs are based on international consensus guidelines and locally adjusted to account for laboratory resources and populations characteristics. Our objective is to provide a clinical laboratory decision support tool to identify which CBC variables are linked to an increased risk of abnormal manual smear and at which threshold values. Thus, we treat criteria adjustment as a feature selection problem. We propose a cost-sensitive Lasso-penalised additive logistic regression combined with stability selection, adapted to the peculiarities of data and context: class-imbalance, categorisation of continuous predictors, required stability and enhanced interpretability. Using simulated and real CBC data, we show that our proposal is competitive in terms of predictive performance (compared to deep neural networks) and model selection performance (provided that there is sufficient data in the neighbourhood of the true thresholds). The R code is publicly available as an open source project.
- Published
- 2021