1. A randomized algorithm for clustering discrete sequences.
- Author
-
Jiang, Mudi, Hu, Lianyu, Han, Xin, Zhou, Yong, and He, Zengyou
- Subjects
- *
VECTOR data , *ALGORITHMS , *CLUSTER analysis (Statistics) , *RANDOM sets , *DATA mining , *ORDERED sets , *WEIGHTED graphs - Abstract
Cluster analysis is one of the most important research issues in data mining and machine learning. To date, numerous clustering algorithms have been proposed to tackle the fixed-length vector data. In many real applications, we need to detect clusters from a set of discrete sequences in which each sequence is an ordered list of items. Due to the sequential and discrete nature, the discrete sequence clustering problem is more challenging and most of existing vector data clustering algorithms cannot be directly employed. In this paper, we present a stochastic algorithm for clustering discrete sequences. Our method first quickly generates a set of random partitions over the sequential data set and then merges these random clustering results via weighted graph construction and partition. We perform extensive empirical comparisons on real data sets to show that our method is comparable to those state-of-the-art clustering algorithms with respect to both accuracy and efficiency. • A randomized algorithm is proposed to solve the discrete sequence clustering issue. • Our algorithm provides high success rate for sequence linkage in clusters. • Experimental results demonstrate the feasibility and effectiveness of our method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF