1. Comparison of 3 classification algorithms for diabetes prediction in the United States.
- Author
-
Kang, Yong Ting and Nugroho, Yusuf Sulistyo
- Subjects
- *
DECISION trees , *K-nearest neighbor classification , *DIABETES , *CLASSIFICATION algorithms , *FEATURE selection , *DATA scrubbing , *PEOPLE with diabetes , *FORECASTING - Abstract
Purpose: This study aims to find out the best model of the classifier for diabetes prediction in the United States. Methodology: In this research paper, we apply three classification algorithms, namely, Naïve Bayes, Decision Tree and K-Nearest Neighbour to work on the healthcare problem, of early detection of diabetes patients. The data are extracted from Kaggle and pre-processed to ensure the data is cleaned. The dataset is analysed using exploratory data analysis (EDA) and study the correlation between the parameters. The parameters which is not suitable are dropped out for better feature selection and modelling. The model is tested with different parameters, that are, accuracy score, classification report, MSE and RMSE. After the models built, the models are hyperparameter tunned to increase the accuracy of each model. Grid search is one of the functions used to search for the best optimizer for the model. The data are tabulated, and the results are compared for further discussion. Results: The study findings show that the decision tree model has the best result after the hyperparameter tunning. The results are highest test accuracy score (74.24%), with the lowest MSE (0.2576) and RMSE value (0.5075) and strong precision as well as recall for the prediction of no diabetes and diabetes. The evaluation rate shows that the model is the best algorithm for the prediction of diabetes for Americans. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF