Back to Search Start Over

A Novel Variable-order Markov Model for Clustering Categorical Sequences.

Authors :
Society, Tengke Xiong
Wang, Shengrui
Jiang, Qingshan
Huang, Joshua Zhexue
Source :
IEEE Transactions on Knowledge & Data Engineering; Oct2014, Vol. 26 Issue 10, p2339-2353, 15p
Publication Year :
2014

Abstract

Clustering categorical sequences is an important and difficult data mining task. Despite recent efforts, the challenge remains, due to the lack of an inherently meaningful measure of pairwise similarity. In this paper, we propose a novel variable-order Markov framework, named weighted conditional probability distribution (WCPD), to model clusters of categorical sequences. We propose an efficient and effective approach to solve the challenging problem of model initialization. To initialize the WCPD model, we propose to use a first-order Markov model built on a weighted fuzzy indicator vector representation of categorical sequences, which we call the WFI Markov model. Based on a cascade optimization framework that combines the WCPD and WFI models, we design a new divisive hierarchical clustering algorithm for clustering categorical sequences. Experimental results on data sets from three different domains demonstrate the promising performance of our models and clustering algorithm. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10414347
Volume :
26
Issue :
10
Database :
Complementary Index
Journal :
IEEE Transactions on Knowledge & Data Engineering
Publication Type :
Academic Journal
Accession number :
98013450
Full Text :
https://doi.org/10.1109/TKDE.2013.104