1. Object-based cluster validation with densities.
- Author
-
Tavakkol, Behnam, Choi, Jeongsub, Jeong, Myong Kee, and Albin, Susan L.
- Subjects
- *
ALGORITHMS , *DENSITY , *STATISTICS - Abstract
• In this paper, an object-based clustering validity index with densities referred to as OCVD is proposed. This index uses densities of clusters to capture the exclusive contribution of each data object in both separation and compactness of clusters. • OCVD is superior to many existing clustering validity indices that capture the properties of clusters by using representative statistics such as mean, variance, diameter, etc. The reason is that those indices might not perform well in capturing the whole characteristics of clusters with arbitrary shapes but OCVD is well capable of doing so. • Although there are some existing density-based validity indices, studies show that they have problems such as poor performance on clusters with arbitrary shapes which are not necessarily perfectly separated and poor performance due to relying only on some representative data objects in clusters. Clustering validity indices are typically used as tools to find the correct number of clusters in a data set and/or to evaluate the quality of the clusters formed by clustering algorithms. Clustering validity indices measure separation and compactness of clusters. Typically, when applying a clustering algorithm, the input includes the number of clusters. After applying the algorithm with several different numbers of clusters, we determine the number of clusters to be the one with the best validity index. There are two types of clustering validity indices: external indices that are supervised, and internal indices that are unsupervised. The focus of this paper is on internal validity indices. Some existing internal validity indices capture the properties of the clusters by using representative statistics such as mean, variance, diameter, etc., however, these do not perform well when clusters have arbitrary shapes. One approach to overcome this issue is to use the density of the data objects in each cluster. That provides the advantage of capturing the full characteristics of the cluster which is most beneficial when there are clusters with arbitrary shapes. In the literature, a few density-based clustering validity indices have been proposed. However, some of them show poor performance when the clusters are not perfectly separated. Some others perform poorly because they use only representative objects from each cluster instead of all objects. The contribution of this paper is an internal validity index named the object-based clustering validity index with densities (OCVD). OCVD is a single number that averages the density-based contribution of individual data objects to both separation and compactness of clusters. The methodology behind calculating the density-based contributions of the objects is kernel density estimation. We show through several experiments that OCVD performs well in detecting the correct number of clusters in data sets with different cluster shapes including arbitrary shapes. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF