1. HY-DBSCAN: A hybrid parallel DBSCAN clustering algorithm scalable on distributed-memory computers.
- Author
-
Wu, Guoqing, Cao, Liqiang, Tian, Hongyun, and Wang, Wei
- Subjects
- *
MODERN architecture , *ALGORITHMS , *COMPUTER workstation clusters , *COMPUTERS , *PARALLEL algorithms , *DISTRIBUTED algorithms , *SCALABILITY - Abstract
• A parallel scalable DBSCAN algorithm which outperforms other implementations. • Optimizations for data partitioning, spatial indexing, and cluster merging. • Exploiting hybrid parallelization to take advantage of modern HPC architectures. • Demonstrating accuracy, performance and scalability of our algorithm. Dbscan is a density-based clustering algorithm which is well known for its ability to discover clusters of arbitrary shape as well as to distinguish noise. As it is computationally expensive for large datasets, research studies on the parallelization of Dbscan have been received a considerable amount of attention. In this paper we present an exact, efficient and scalable parallel Dbscan algorithm which we call Hy-Dbscan. It employs three major techniques to enable scalable data clustering on distributed-memory computers i) a modified kd-tree for domain decomposition, ii) a spatial indexing approach based on grid and inference, and iii) a cluster merging scheme based on distributed Rem's Union-Find algorithm. Moreover, Hy-Dbscan exploits process level and thread level parallelization. In experiments, we have demonstrated performance and scalability using two scientific datasets on up to 2048 cores of a distributed-memory computer. Through extensive evaluation, we show that Hy-Dbscan significantly outperforms previous state-of-the-art Dbscan implementations. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF