251. Clustering XML Documents by Structure Based on Common Neighbor
- Author
-
Zhengxuan Wang, Xizhe Zhang, Wanli Zuo, and Tianyang Lv
- Subjects
DBSCAN ,Clustering high-dimensional data ,Fuzzy clustering ,Brown clustering ,Computer science ,Correlation clustering ,Constrained clustering ,Conceptual clustering ,computer.software_genre ,Biclustering ,ComputingMethodologies_PATTERNRECOGNITION ,Data stream clustering ,CURE data clustering algorithm ,Consensus clustering ,Outlier ,Canopy clustering algorithm ,Affinity propagation ,FLAME clustering ,Anomaly detection ,Data mining ,Cluster analysis ,computer - Abstract
It is important to perform the clustering task on XML documents. However, it is difficult to select the appropriate parameters’ value for the clustering algorithms. Meanwhile, current clustering algorithms lack the effective mechanism to detect outliers while treating outliers as “noise”. By integrating outlier detection with clustering, the paper takes a new approach for analyzing the XML documents by structure. After stating the concept of common neighbor based outlier, the paper proposes a new clustering algorithm, which stops clustering automatically by utilizing the outlier information and needs only one parameter, whose appropriate value range is decided in the outlier mining process. After discussing some features of the proposed algorithm, the paper adopts the XML dataset with different structure and other real-life datasets to compare it with other clustering algorithms.
- Published
- 2005