Katarzyna Nabrdalik, Hanna Kwiendacz, Karolina Drożdż, Krzysztof Irlik, Mirela Hendel, Agata M. Wijata, Jakub Nalepa, Elon Correa, Weronika Hajzler, Oliwia Janota, Wiktoria Wójcik, Janusz Gumprecht, and Gregory Y.H. Lip
We aimed to develop a machine learning (ML) model for predicting cardiovascular (CV) events in patients with diabetes (DM). This was a prospective, observational study where clinical data of patients with diabetes hospitalized in the diabetology center in Poland (years 2015-2020) were analyzed using ML. The occurrence of new CV events following discharge was collected in the follow-up time for up to 5 years and 9 months. An end-to-end ML technique which exploits the neighborhood component analysis for elaborating discriminative predictors, followed by a hybrid sampling/boosting classification algorithm, multiple logistic regression (MLR), or unsupervised hierarchical clustering was proposed. In 1735 patients with diabetes (53% female), there were 150 (8.65%) ones with a new CV event in the follow-up. Twelve most discriminative patients' parameters included coronary artery disease, heart failure, peripheral artery disease, stroke, diabetic foot disease, chronic kidney disease, eosinophil count, serum potassium level, and being treated with clopidogrel, heparin, proton pump inhibitor, and loop diuretic. Utilizing those variables resulted in the area under the receiver operating characteristic curve (AUC) ranging from 0.62 (95% Confidence Interval [CI] 0.56-0.68, P < 0.01) to 0.72 (95% CI 0.66-0.77, P < 0.01) across 5 nonoverlapping test folds, whereas MLR correctly determined 111/150 (74.00%) high-risk patients, and 989/1585 (62.40%) low-risk patients, resulting in 1100/1735 (63.40%) correctly classified patients (AUC: 0.72, 95% CI 0.66-0.77). ML algorithms can identify patients with diabetes at a high risk of new CV events based on a small number of interpretable and easy-to-obtain patients' parameters. We aimed to develop a machine learning (ML) model for predicting cardiovascular (CV) events in patients with diabetes (DM). This was a prospective, observational study where clinical data of patients with diabetes hospitalized in the diabetology center in Poland (years 2015-2020) were analyzed using ML. The occurrence of new CV events following discharge was collected in the follow-up time for up to 5 years and 9 months. An end-to-end ML technique which exploits the neighborhood component analysis for elaborating discriminative predictors, followed by a hybrid sampling/boosting classification algorithm, multiple logistic regression (MLR), or unsupervised hierarchical clustering was proposed. In 1735 patients with diabetes (53% female), there were 150 (8.65%) ones with a new CV event in the follow-up. Twelve most discriminative patients’ parameters included coronary artery disease, heart failure, peripheral artery disease, stroke, diabetic foot disease, chronic kidney disease, eosinophil count, serum potassium level, and being treated with clopidogrel, heparin, proton pump inhibitor, and loop diuretic. Utilizing those variables resulted in the area under the receiver operating characteristic curve (AUC) ranging from 0.62 (95% Confidence Interval [CI] 0.56-0.68, P < 0.01) to 0.72 (95% CI 0.66-0.77, P < 0.01) across 5 nonoverlapping test folds, whereas MLR correctly determined 111/150 (74.00%) high-risk patients, and 989/1585 (62.40%) low-risk patients, resulting in 1100/1735 (63.40%) correctly classified patients (AUC: 0.72, 95% CI 0.66-0.77). ML algorithms can identify patients with diabetes at a high risk of new CV events based on a small number of interpretable and easy-to-obtain patients’ parameters.