Back to Search Start Over

A novel clustering-based purity and distance imputation for handling medical data with missing values.

Authors :
Cheng, Ching-Hsue
Huang, Shu-Fen
Source :
Soft Computing - A Fusion of Foundations, Methodologies & Applications. Sep2021, Vol. 25 Issue 17, p11781-11801. 21p.
Publication Year :
2021

Abstract

Nowadays, people pay increasing attention to health, and the integrity of medical records has been put into focus. Recently, medical data imputation has become a very active field because medical data usually have missing values. Many imputation methods have been proposed, but many model-based imputation methods such as expectation–maximization and regression-based imputation based on the variables data have a multivariate normal distribution, which assumption can lead to biased results. Sometimes, this becomes a bottleneck, such as computationally more complex than model-free methods. Furthermore, directly removing instances with missing values has several problems, and it is possible to lose the important data, produce ineffective research samples, and cause research deviations. Therefore, this study proposes a novel clustering-based purity and distance imputation method to improve the handling of missing values. In the experiment, we collected eight different medical datasets to compare the proposed imputation methods with the listed imputation methods with regard to the results of different situations. In imputation measures, the area under the curve (AUC) is used to evaluate the performance of the imbalanced class datasets in MAR and MCAR experiments, and accuracy is applied to measure its performance of the balanced class in MNAR experiment. Finally, the root-mean-square error (RMSE) is also used to compare the proposed and the listing imputation methods. In addition, this study utilized the elbow method and the average silhouette method to find the optimal number of clusters for all datasets. Results showed that the proposed imputation method could improve imputation performance in the accuracy, AUC, and RMSE of different missing degrees and missing types. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
14327643
Volume :
25
Issue :
17
Database :
Academic Search Index
Journal :
Soft Computing - A Fusion of Foundations, Methodologies & Applications
Publication Type :
Academic Journal
Accession number :
151819489
Full Text :
https://doi.org/10.1007/s00500-021-05947-3