1. Order-Sensitive Imputation for Clustered Missing Values.
- Author
-
Ma, Qian, Gu, Yu, Lee, Wang-Chien, and Yu, Ge
- Subjects
- *
MISSING data (Statistics) , *MACHINE learning , *MATHEMATICAL optimization , *HEURISTIC algorithms , *ESTIMATION theory - Abstract
The issue of missing values (MVs) has appeared widely in real-world datasets and hindered the use of many statistical or machine learning algorithms for data analytics due to their incompetence in handling incomplete datasets. To address this issue, several MV imputation algorithms have been developed. However, these approaches do not perform well when most of the incomplete tuples are clustered with each other, coined here as theClustered Missing Values Phenomenon, which attributes to the lack of sufficient complete tuples near an MV for imputation. In this paper, we propose theOrder-Sensitive Imputation for Clustered Missing values(OSICM) framework, in which missing values are imputed sequentially such that the values filled earlier in the process are also used for later imputation of other MVs. Obviously, the order of imputations is critical to the effectiveness and efficiency of OSICM framework. We formulate the searching of the optimal imputation order as an optimization problem, and show its NP-hardness. Furthermore, we devise an algorithm to find the exact optimal solution and propose two approximate/heuristic algorithms to trade off effectiveness for efficiency. Finally, we conduct extensive experiments on real and synthetic datasets to demonstrate the superiority of our OSICM framework. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF