The fast clustering algorithm for the big data based on K-means.

Authors :: Xie, Ting
Zhang, Taiping
Source :: International Journal of Wavelets, Multiresolution & Information Processing. Nov2020, Vol. 18 Issue 6, pN.PAG-N.PAG. 15p.
Publication Year :: 2020
Abstract: As a powerful unsupervised learning technique, clustering is the fundamental task of big data analysis. However, many traditional clustering algorithms for big data that is a collection of high dimension, sparse and noise data do not perform well both in terms of computational efficiency and clustering accuracy. To alleviate these problems, this paper presents Feature K-means clustering model on the feature space of big data and introduces its fast algorithm based on Alternating Direction Multiplier Method (ADMM). We show the equivalence of the Feature K-means model in the original space and the feature space and prove the convergence of its iterative algorithm. Computationally, we compare the Feature K-means with Spherical K-means and Kernel K-means on several benchmark data sets, including artificial data and four face databases. Experiments show that the proposed approach is comparable to the state-of-the-art algorithm in big data clustering. [ABSTRACT FROM AUTHOR]

Subjects :: *ALGORITHMS
*DATABASES
*K-means clustering
*SINGULAR value decomposition

Language :: English
ISSN :: 02196913
Volume :: 18
Issue :: 6
Database :: Academic Search Index
Journal :: International Journal of Wavelets, Multiresolution & Information Processing
Publication Type :: Academic Journal
Accession number :: 147476960
Full Text :: https://doi.org/10.1142/S0219691320500538