Back to Search
Start Over
Local Similarity Imputation Based on Fast Clustering for Incomplete Data in Cyber-Physical Systems
- Source :
- IEEE Systems Journal. 12:1610-1620
- Publication Year :
- 2018
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2018.
-
Abstract
- Missing values are common in cyber-physical systems (CPS) for a variety of reasons, such as sensor faults, communication malfunctions, environmental interferences, and human errors. An accurate missing value imputation is crucial to promote the data quality for data mining and statistical analysis tasks. Unfortunately, most of the existing methods take use of the whole data set to impute a missing value, which could have unfavorable influences and impacts (low accuracy or high complexity) on the imputed results caused by irrelevant records. Aiming at this problem, this paper develops a novel local similarity imputation method that estimates missing data based on fast clustering and top $k$ -nearest neighbors. To improve the imputation accuracy, a two-layer stacked autoencoder combined with distinctive imputation is applied to locate the principal features of a dataset for clustering. Then, the top $k$ -nearest neighbor hybrid distance weighted imputation is approached to fill in missing values in clusters. The proposed method is evaluated on five popular University of California Irvine datasets as well as one air quality monitoring dataset collected from CPS through comparison with four high-quality existing imputation methods. Empirical results present that the proposed scheme can impute the missing data values effectively and efficiently, especially for the incomplete data with local characteristic in CPS.
- Subjects :
- Fuzzy clustering
Computer Networks and Communications
Correlation clustering
02 engineering and technology
Machine learning
computer.software_genre
CURE data clustering algorithm
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
Imputation (statistics)
Electrical and Electronic Engineering
Cluster analysis
k-medians clustering
Mathematics
business.industry
Missing data
Computer Science Applications
Data stream clustering
Control and Systems Engineering
Data_GENERAL
020201 artificial intelligence & image processing
Artificial intelligence
Data mining
business
computer
Information Systems
Subjects
Details
- ISSN :
- 23737816 and 19328184
- Volume :
- 12
- Database :
- OpenAIRE
- Journal :
- IEEE Systems Journal
- Accession number :
- edsair.doi...........c45f38f3d82566074f4629e13a32266b