1. Machine Learning for the Prediction of New-Onset Diabetes Mellitus during 5-Year Follow-up in Non-Diabetic Patients with Cardiovascular Risks
- Author
-
Seung-Woon Rha, Ji Young Park, Suhng Wook Kim, Yung-Kyun Noh, Jun Hyuk Kang, and Byoung Geol Choi
- Subjects
Male ,030204 cardiovascular system & hematology ,Machine learning ,computer.software_genre ,Logistic regression ,Standard deviation ,Machine Learning ,03 medical and health sciences ,Endocrinology & Metabolism ,0302 clinical medicine ,big data ,Risk Factors ,Type 2 diabetes mellitus ,Republic of Korea ,Medicine ,Humans ,diabetes ,business.industry ,Medical record ,Area under the curve ,Type 2 Diabetes Mellitus ,Reproducibility of Results ,General Medicine ,prediction ,Quadratic classifier ,Middle Aged ,Linear discriminant analysis ,Logistic Models ,Diabetes Mellitus, Type 2 ,ROC Curve ,Cardiovascular Diseases ,030220 oncology & carcinogenesis ,Original Article ,Female ,Artificial intelligence ,business ,computer ,Algorithms ,Non diabetic ,Follow-Up Studies - Abstract
PURPOSE Many studies have proposed predictive models for type 2 diabetes mellitus (T2DM). However, these predictive models have several limitations, such as user convenience and reproducibility. The purpose of this study was to develop a T2DM predictive model using electronic medical records (EMRs) and machine learning and to compare the performance of this model with traditional statistical methods. MATERIALS AND METHODS In this study, a total of available 8454 patients who had no history of diabetes and were treated at the cardiovascular center of Korea University Guro Hospital were enrolled. All subjects completed 5 years of follow up. The prevalence of T2DM during follow up was 4.78% (404/8454). A total of 28 variables were extracted from the EMRs. In order to verify the cross-validation test according to the prediction model, logistic regression (LR), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and K-nearest neighbor (KNN) algorithm models were generated. The LR model was considered as the existing statistical analysis method. RESULTS All predictive models maintained a change within the standard deviation of area under the curve (AUC)
- Published
- 2019