1. Machine learning techniques to predict different levels of hospital care of CoVid-19
- Author
-
Brais Cancela-Barizo, Elena Hernández-Pereira, Verónica Bolón-Canedo, Amparo Alonso-Betanzos, Bertha Guijarro-Berdiñas, and Oscar Fontenla-Romero
- Subjects
Coronavirus disease 2019 (COVID-19) ,Computer science ,business.industry ,Feature selection ,Machine learning ,computer.software_genre ,Intensive care unit ,Article ,Hospital care ,law.invention ,Data set ,CoVid-19 ,Artificial Intelligence ,Clinical history ,law ,Hospital admission ,Supervised classification ,Medical history ,Artificial intelligence ,business ,computer - Abstract
Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG [Abstract] In this study, we analyze the capability of several state of the art machine learning methods to predict whether patients diagnosed with CoVid-19 (CoronaVirus disease 2019) will need different levels of hospital care assistance (regular hospital admission or intensive care unit admission), during the course of their illness, using only demographic and clinical data. For this research, a data set of 10,454 patients from 14 hospitals in Galicia (Spain) was used. Each patient is characterized by 833 variables, two of which are age and gender and the other are records of diseases or conditions in their medical history. In addition, for each patient, his/her history of hospital or intensive care unit (ICU) admissions due to CoVid-19 is available. This clinical history will serve to label each patient and thus being able to assess the predictions of the model. Our aim is to identify which model delivers the best accuracies for both hospital and ICU admissions only using demographic variables and some structured clinical data, as well as identifying which of those are more relevant in both cases. The results obtained in the experimental study show that the best models are those based on oversampling as a preprocessing phase to balance the distribution of classes. Using these models and all the available features, we achieved an area under the curve (AUC) of 76.1% and 80.4% for predicting the need of hospital and ICU admissions, respectively. Furthermore, feature selection and oversampling techniques were applied and it has been experimentally verified that the relevant variables for the classification are age and gender, since only using these two features the performance of the models is not degraded for the two mentioned prediction problems. This research has been supported by GAIN (Galician Innovation Agency) and the Regional Ministry of Economy, Employment and Industry, Xunta de Galicia grant COV20/00604 through the ERDF Funds. Also, it has been possible thanks to the support of the Xunta de Galicia (Dirección Xeral de Saúde Pública) by providing the anonymous patient data. Also, it has been supported by the Xunta de Galicia (Grant ED431C 2018/34 and IN845D 2020/26 of the Axencia Galega de Innovación) with European Union ERDF funds. CITIC, as Research Center accredited by Galician University System, is funded by Consellería de Cultura, Educación e Universidades from Xunta de Galicia, supported in an 80% through ERDF Funds, ERDF Operational Programme Galicia 2014-2020, and the remaining 20% by Secretaría Xeral de Universidades (Grant ED431G 2019/01). Finally, we would also like to thank Prof. Ricardo Cao, as Chairman of the Committee of Experts for Mathematical Action against Coronavirus, for his kind request to collaborate in this project Xunta de Galicia; COV20/00604 Xunta de Galicia; ED431C 2018/34 Xunta de Galicia; IN845D 2020/26 Xunta de Galicia; ED431G 2019/01
- Published
- 2021