1. Performance assessment of different machine learning approaches in predicting diabetic ketoacidosis in adults with type 1 diabetes using electronic health records data
- Author
-
Chuang-Chung Lee, Kristen Sharma, Zoran Doder, Cliona Molony, Fang Liz Zhou, Evgeny Zalmover, Lin Li, Chuntao Wu, and Juhaeri Juhaeri
- Subjects
Adult ,AUC ,Epidemiology ,Logistic regression ,Machine learning ,computer.software_genre ,030226 pharmacology & pharmacy ,Cross-validation ,Diabetic Ketoacidosis ,Machine Learning ,03 medical and health sciences ,0302 clinical medicine ,Lasso (statistics) ,Medicine ,Electronic Health Records ,Humans ,Pharmacology (medical) ,030212 general & internal medicine ,Type 1 diabetes ,business.industry ,logistic regression ,Area under the curve ,Original Articles ,medicine.disease ,Confidence interval ,Random forest ,prediction model ,Diabetes Mellitus, Type 1 ,Logistic Models ,Test set ,Original Article ,Artificial intelligence ,least absolute shrinkage and selection operator ,business ,computer - Abstract
Purpose To assess the performance of different machine learning (ML) approaches in identifying risk factors for diabetic ketoacidosis (DKA) and predicting DKA. Methods This study applied flexible ML (XGBoost, distributed random forest [DRF] and feedforward network) and conventional ML approaches (logistic regression and least absolute shrinkage and selection operator [LASSO]) to 3,400 DKA cases and 11,780 controls nested in adults with type 1 diabetes identified from Optum® de-identified Electronic Health Record dataset (2007-2018). Area under the curve (AUC), accuracy, sensitivity and specificity were computed using 5-fold cross validation, and their 95% confidence intervals (CI) were established using 1,000 bootstrap samples. The importance of predictors was compared across these models. Results In the training set, XGBoost and feedforward network yielded higher AUC values (0.89 and 0.86, respectively) than logistic regression (0.83), LASSO (0.83) and DRF (0.81). However, the AUC values were similar (0.82) among these approaches in the test set (95% CI range, 0.80-0.84). While the accuracy values >0.8 and the specificity values >0.9 for all models, the sensitivity values were only 0.4. The differences in these metrics across these models were minimal in the test set. All approaches selected some known risk factors for DKA as the top ten features. XGBoost and DRF included more laboratory measurements or vital signs compared with conventional ML approaches, while feedforward network included more social demographics. Conclusions In our empirical study, all ML approaches demonstrated similar performance, and identified overlapping, but different, top ten predictors. The difference in selected top predictors needs further research. This article is protected by copyright. All rights reserved.
- Published
- 2021