1. The impact of isolation kernel on agglomerative hierarchical clustering algorithms.
- Author
-
Han, Xin, Zhu, Ye, Ting, Kai Ming, and Li, Gang
- Subjects
- *
HIERARCHICAL clustering (Cluster analysis) , *ALGORITHMS , *MACHINE learning , *HIERARCHICAL Bayes model - Abstract
• Providing the condition under which an AHC does not effectively extract clusters. • Introducing the entanglement to measure the dendrogram quality. • Identifying the root cause of a density bias of traditional AHC algorithms. • Improving AHC performance with a data-dependent kernel. Agglomerative hierarchical clustering (AHC) is one of the popular clustering approaches. AHC generates a dendrogram that provides richer information and insights from a dataset than partitioning clustering. However, a major problem with existing distance-based AHC methods is: it fails to effectively identify adjacent clusters with varied densities, regardless of the cluster extraction methods applied to the resultant dendrogram. This paper aims to reveal the root cause of this issue and provides a solution by using a data-dependent kernel. We analyse the condition under which existing AHC methods fail to effectively extract clusters, and give the reason why the data-dependent kernel is an effective remedy. This leads to a new approach to kernerlise existing hierarchical clustering algorithms including the traditional AHC algorithms, HDBSCAN, GDL, PHA and HC-OT. Our extensive empirical evaluation shows that the recently introduced Isolation Kernel produces a higher quality or purer dendrogram than distance, Gaussian Kernel and adaptive Gaussian Kernel in all the above mentioned AHC algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF