Back to Search
Start Over
A Novel Variable-order Markov Model for Clustering Categorical Sequences.
- Source :
- IEEE Transactions on Knowledge & Data Engineering; Oct2014, Vol. 26 Issue 10, p2339-2353, 15p
- Publication Year :
- 2014
-
Abstract
- Clustering categorical sequences is an important and difficult data mining task. Despite recent efforts, the challenge remains, due to the lack of an inherently meaningful measure of pairwise similarity. In this paper, we propose a novel variable-order Markov framework, named weighted conditional probability distribution (WCPD), to model clusters of categorical sequences. We propose an efficient and effective approach to solve the challenging problem of model initialization. To initialize the WCPD model, we propose to use a first-order Markov model built on a weighted fuzzy indicator vector representation of categorical sequences, which we call the WFI Markov model. Based on a cascade optimization framework that combines the WCPD and WFI models, we design a new divisive hierarchical clustering algorithm for clustering categorical sequences. Experimental results on data sets from three different domains demonstrate the promising performance of our models and clustering algorithm. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 10414347
- Volume :
- 26
- Issue :
- 10
- Database :
- Complementary Index
- Journal :
- IEEE Transactions on Knowledge & Data Engineering
- Publication Type :
- Academic Journal
- Accession number :
- 98013450
- Full Text :
- https://doi.org/10.1109/TKDE.2013.104