Back to Search Start Over

An Effective Clustering Method over CF$^+$+ Tree Using Multiple Range Queries.

Authors :
Ryu, Hyeong-Cheol
Jung, Sungwon
Pramanik, Sakti
Source :
IEEE Transactions on Knowledge & Data Engineering. Sep2020, Vol. 32 Issue 9, p1694-1706. 13p.
Publication Year :
2020

Abstract

Many existing clustering methods usually compute clusters from the reduced data sets obtained by summarizing the original very large data sets. BIRCH is a popular summary-based clustering method that first builds a CF tree, and then performs a global clustering using the leaf entries of the tree. However, to the best of our knowledge, no prior studies have proposed a global clustering method that uses the structure of a CF tree. Therefore, we propose a novel global clustering method ERC (effective multiple range queries-based clustering), which takes advantage of the structure of a CF tree. We further propose a CF $^+$ + tree, which optimizes the node split scheme used in the CF tree. As a result, the CF $^+$ + -ERC (CF $^+$ + tree-based ERC) method effectively computes clusters over large data sets. Furthermore, it does not require a predefined number of clusters to compute the clusters. We present in-depth theoretical and experimental analyses of our method. Experimental results on very large synthetic data sets demonstrate that the proposed approach is effective in terms of cluster quality and robustness and is significantly faster than existing clustering methods. In addition, we apply our clustering method to real data sets and achieve promising results. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10414347
Volume :
32
Issue :
9
Database :
Academic Search Index
Journal :
IEEE Transactions on Knowledge & Data Engineering
Publication Type :
Academic Journal
Accession number :
145130663
Full Text :
https://doi.org/10.1109/TKDE.2019.2911520