Back to Search Start Over

Interpretable Data-Driven Approach Based on Feature Selection Methods and GAN-Based Models for Cardiovascular Risk Prediction in Diabetic Patients

Authors :
David Chushig-Muzo
Hugo Calero-Diaz
Francisco J. Lara-Abelenda
Vanesa Gomez-Martinez
Conceicao Granja
Cristina Soguero-Ruiz
Source :
IEEE Access, Vol 12, Pp 84292-84305 (2024)
Publication Year :
2024
Publisher :
IEEE, 2024.

Abstract

Noncommunicable diseases (NCDs) are the leading cause of morbidity and mortality worldwide. Cardiovascular diseases (CVDs) and diabetes are the most prevalent NCDs, causing 1.9 and 1.5 million deaths yearly. Individuals diagnosed with type 1 diabetes (T1D) are at high risk of developing CVDs. Machine learning (ML) models have provided outstanding results in different domains, including healthcare, allowing to obtain models with high predictive performance. The aim of this study was to develop an interpretable data-driven approach to predict the 10-year CVD risk for T1D older individuals, aiming to provide both reasonable predictive performance and the identification of risk factors associated with CVDs. Data from T1D individuals at the Steno Diabetes Center Copenhagen were used. Different ML-based models were considered, including KNN, decision tree, random forest, and multilayer perceptron (MLP). To enhance the predictive performance of ML models, the conditional tabular generative adversarial network (CTGAN) was used to create synthetic data and increase the size of the training data. Several filter and wrapper feature selection (FS) techniques were considered for identifying the most relevant features involved in CVD risk and enhancing the performance of the ML-based models used. To gain interpretability on predictive models, we used the post-hoc methods: SHAP and accumulated local effects. The experimental results showed a great performance of FS and ML-based models for predicting CVD risk. In particular, the MLP obtained the best results, with a mean absolute error of 0.0088 and mean relative absolute error of 0.0817. Regarding risk factors, age, Hba1c, and albuminuria were identified as crucial in CVD risk prediction, which is in line with recent clinical evidence. Our study contributes to identifying CVD risk and associated risk factors in a data-driven manner, helping to make early interventions and adequate treatments to prevent CVDs.

Details

Language :
English
ISSN :
21693536
Volume :
12
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.42d884569fd94a18bfbfa53524e55c16
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2024.3412789