Back to Search Start Over

Faster balanced clusterings in high dimension.

Authors :
Ding, Hu
Source :
Theoretical Computer Science. Nov2020, Vol. 842, p28-40. 13p.
Publication Year :
2020

Abstract

The problem of constrained clustering has attracted significant attention in the past decades. In this paper, we study the balanced k -center, k -median, and k -means clustering problems where the size of each cluster is constrained by the given lower and upper bounds. The problems are motivated by the applications in processing large-scale data in high dimension. Existing methods often need to compute complicated matchings (or min cost flows) to satisfy the balance constraint, and thus suffer from high complexities especially in high dimension. We develop an effective framework for the three balanced clustering problems to address this issue, and our method is based on a novel spatial partition idea in geometry. For the balanced k -center clustering, we provide a 4-approximation algorithm that improves the existing approximation factors; for the balanced k -median and k -means clusterings, our algorithms yield constant and (1 + ϵ) -approximation factors with any ϵ > 0. More importantly, our algorithms achieve linear or nearly linear running times when k is a constant, and significantly improve the existing ones. Our results can be easily extended to metric balanced clusterings and the running times are sub-linear in terms of the complexity of n -point metric. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
03043975
Volume :
842
Database :
Academic Search Index
Journal :
Theoretical Computer Science
Publication Type :
Academic Journal
Accession number :
146169519
Full Text :
https://doi.org/10.1016/j.tcs.2020.07.022