Back to Search Start Over

Incremental entropy-based clustering on categorical data streams with concept drift.

Authors :
Li, Yanhong
Li, Deyu
Wang, Suge
Zhai, Yanhui
Source :
Knowledge-Based Systems. Mar2014, Vol. 59, p33-47. 15p.
Publication Year :
2014

Abstract

Abstract: Clustering on categorical data streams is a relatively new field that has not received as much attention as static data and numerical data streams. One of the main difficulties in categorical data analysis is lacking in an appropriate way to define the similarity or dissimilarity measure on data. In this paper, we propose three dissimilarity measures: a point-cluster dissimilarity measure (based on incremental entropy), a cluster–cluster dissimilarity measure (based on incremental entropy) and a dissimilarity measure between two cluster distributions (based on sample standard deviation). We then propose an integrated framework for clustering categorical data streams with three algorithms: Minimal Dissimilarity Data Labeling (MDDL), Concept Drift Detection (CDD) and Cluster Evolving Analysis (CEA). We also make comparisons with other algorithms on several data streams synthesized from real data sets. Experiments show that the proposed algorithms are more effective in generating clustering results and detecting concept drift. [Copyright &y& Elsevier]

Details

Language :
English
ISSN :
09507051
Volume :
59
Database :
Academic Search Index
Journal :
Knowledge-Based Systems
Publication Type :
Academic Journal
Accession number :
94792983
Full Text :
https://doi.org/10.1016/j.knosys.2014.02.004