Back to Search Start Over

Ensemble Learning Models Based on Noninvasive Features for Type 2 Diabetes Screening: Model Development and Validation

Authors :
Yang, Tianzhou
Zhang, Li
Yi, Liwei
Feng, Huawei
Li, Shimeng
Chen, Haoyu
Zhu, Junfeng
Zhao, Jian
Zeng, Yingyue
Liu, Hongsheng
Source :
JMIR Medical Informatics, Vol 8, Iss 6, p e15431 (2020)
Publication Year :
2020
Publisher :
JMIR Publications, 2020.

Abstract

BackgroundEarly diabetes screening can effectively reduce the burden of disease. However, natural population–based screening projects require a large number of resources. With the emergence and development of machine learning, researchers have started to pursue more flexible and efficient methods to screen or predict type 2 diabetes. ObjectiveThe aim of this study was to build prediction models based on the ensemble learning method for diabetes screening to further improve the health status of the population in a noninvasive and inexpensive manner. MethodsThe dataset for building and evaluating the diabetes prediction model was extracted from the National Health and Nutrition Examination Survey from 2011-2016. After data cleaning and feature selection, the dataset was split into a training set (80%, 2011-2014), test set (20%, 2011-2014) and validation set (2015-2016). Three simple machine learning methods (linear discriminant analysis, support vector machine, and random forest) and easy ensemble methods were used to build diabetes prediction models. The performance of the models was evaluated through 5-fold cross-validation and external validation. The Delong test (2-sided) was used to test the performance differences between the models. ResultsWe selected 8057 observations and 12 attributes from the database. In the 5-fold cross-validation, the three simple methods yielded highly predictive performance models with areas under the curve (AUCs) over 0.800, wherein the ensemble methods significantly outperformed the simple methods. When we evaluated the models in the test set and validation set, the same trends were observed. The ensemble model of linear discriminant analysis yielded the best performance, with an AUC of 0.849, an accuracy of 0.730, a sensitivity of 0.819, and a specificity of 0.709 in the validation set. ConclusionsThis study indicates that efficient screening using machine learning methods with noninvasive tests can be applied to a large population and achieve the objective of secondary prevention.

Details

Language :
English
ISSN :
22919694
Volume :
8
Issue :
6
Database :
Directory of Open Access Journals
Journal :
JMIR Medical Informatics
Publication Type :
Academic Journal
Accession number :
edsdoj.9cbf85428907484f8caa18ac59996c4b
Document Type :
article
Full Text :
https://doi.org/10.2196/15431