Back to Search Start Over

Testing of Clustering.

Authors :
Alon, Noga
Noga, Seannie
Parnas, Michal
Ron, Dana
Source :
SIAM Review. 2004, Vol. 46 Issue 2, p285-308. 24p. 4 Diagrams.
Publication Year :
2004

Abstract

In this work we study the problem of clustering with respect to the diameter and the radius costs: We say that a set X of points in Rd is (k,b)-clusterable with respect to the diameter cost if X can be partitioned into k subsets (clusters) so that the distance between every pair of points in each cluster is at most b. In the case of the radius cost we require that all points that belong to the same cluster be at a distance of at most b for some common central point. Here we approach the problem of clustering from within the framework of property testing. In property testing, the goal is to determine whether a given object has a particular property or whether it should be modified significantly so that it obtains the property. In the context of clustering, testing takes on the following form: The algorithm is given parameters k, b, β ,and ε and it can sample from the set of points X. The goal of the algorithm is to distinguish between the case when X is (k,b)-clusterable and the case when X is ε-far from being (k, (1 +β)b )-clusterable. By ε-far from being (k,(1 +β )b )-clusterable we mean that more than ε · |X| points should be remove from X so that it becomes (k,(1 +β )b )-clusterable. In this work we describe and analyze algorithms that use a sample of size polynomial in k and 1 /ε and independent of |X|. (The dependence on β and on the dimension, d, of the points varies with the different algorithms.) Such algorithms may be especially useful when the set of points X is very large and it may not even be feasible to observe all of it. Our algorithms can also be used to find approximately good clusterings. Namely, these are clusterings of all but an ε-fraction of the points in X that have optimal (or close to optimal) cost. The benefit of our algorithms is that they construct an implicit representation of such clusterings in time independent of |X|... [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00361445
Volume :
46
Issue :
2
Database :
Academic Search Index
Journal :
SIAM Review
Publication Type :
Academic Journal
Accession number :
13376907
Full Text :
https://doi.org/10.1137/S0036144503437178