Back to Search Start Over

Detecting Meaningful Clusters From High-Dimensional Data: A Strongly Consistent Sparse Center-Based Clustering Approach.

Authors :
Chakraborty, Saptarshi
Das, Swagatam
Source :
IEEE Transactions on Pattern Analysis & Machine Intelligence. Jun2022, Vol. 44 Issue 6, p2894-2908. 15p.
Publication Year :
2022

Abstract

In context to high-dimensional clustering, the concept of feature weighting has gained considerable importance over the years to capture the relative degrees of importance of different features in revealing the cluster structure of the dataset. However, the popular techniques in this area either fail to perform feature selection or do not preserve the simplicity of Lloyd’s heuristic to solve the $k$ k -means problem and the like. In this paper, we propose a Lasso Weighted $k$ k -means ($LW$ L W - $k$ k -means) algorithm, as a simple yet efficient sparse clustering procedure for high-dimensional data where the number of features ($p$ p ) can be much higher than the number of observations ($n$ n ). The $LW$ L W - $k$ k -means method imposes an $\ell _1$ ℓ 1 regularization term involving the feature weights directly to induce feature selection in a sparse clustering framework. We develop a simple block-coordinate descent type algorithm with time-complexity resembling that of Lloyd’s method, to optimize the proposed objective. In addition, we establish the strong consistency of the $LW$ L W - $k$ k -means procedure. Such an analysis of the large sample properties is not available for the conventional sparse $k$ k -means algorithms, in general. $LW$ L W - $k$ k -means is tested on a number of synthetic and real-life datasets and through a detailed experimental analysis, we find that the performance of the method is highly competitive against the baselines as well as the state-of-the-art procedures for center-based high-dimensional clustering, not only in terms of clustering accuracy but also with respect to computational time. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01628828
Volume :
44
Issue :
6
Database :
Academic Search Index
Journal :
IEEE Transactions on Pattern Analysis & Machine Intelligence
Publication Type :
Academic Journal
Accession number :
156742187
Full Text :
https://doi.org/10.1109/TPAMI.2020.3047489