Descriptor: "*HAMMING distance" / Publisher: taylor & francis ltd / Topic: algorithms and autoclass - Searchworks@Jio Institute Digital Library Search Results

1. Clustering Categorical Data Based on Distance Vectors.

Author: Zhang, Peng, Wang, Xiaogang, and Song, Peter X.-K.
Subjects: *CLUSTER analysis (Statistics), *ALGORITHMS, *COMPUTATIONAL complexity, *QUANTITATIVE research, *STATISTICAL correlation, *STATISTICS
Abstract: We introduce a novel statistical procedure for clustering categorical data based on Hamming distance (HD) vectors. The proposed method is conceptually simple and computationally straightforward, because it does not require any specific statistical models or any convergence criteria. Moreover, unlike most currently existing algorithms that compute the class membership or membership probability for every data point at each iteration, our algorithm sequentially extracts clusters from the given dataset. That is, at each iteration our algorithm strives to identify only one cluster, which will then be deleted from the dataset at the next iteration; this procedure repeats until there are no more significant clusters in the remaining data. Consequently, the number of clusters can be determined automatically by the algorithm. As for the identification and extraction of a cluster, we first locate the cluster center by using a Pearson chi-squared-type statistic on the basis of HD vectors. The partition of the dataset produced by our algorithm is unique and insensitive to the input order of data points. The performance of the proposed algorithm is examined using both simulated and real world datasets. Comparisons with two well-known clustering algorithms, K-modes and AutoClass, show that the proposed algorithm substantially outperforms these competitors, with the classification rate or the information gain typically improved by several orders of magnitude. Computational complexity and run time comparisons are also provided. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

1 results on '"*HAMMING distance"'

1. Clustering Categorical Data Based on Distance Vectors.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Language

Publication Type

Database

1 results on '"*HAMMING distance"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources