1. An Optimized K-Means Algorithm of Reducing Cluster Intra-dissimilarity for Document Clustering.
- Author
-
Fan, Wenfei, Wu, Zhaohui, Yang, Jun, Wang, Daling, Yu, Ge, Bao, Yubin, and Zhang, Meng
- Abstract
Due to the high-dimension and sparseness properties of documents, clustering the similar documents together is a tough task. The most popular document clustering method K-Means has the shortcoming of its cluster intra-dissimilarity, i.e. inclining to clustering unrelated documents together. One of the reasons is that all objects (documents) in a cluster produce the same influence to the mean of the cluster. SOM (Self Organizing Map) is a method to reduce the dimension of data and display the data in low dimension space, and it has been applied successfully to clustering of high-dimensional objects. The scalar factor is an important part of SOM. In this paper, an optimized K-Means algorithm is proposed. It introduces the scalar factor from SOM into means during K-Means assignment stage for controlling the influence to the means from new objects. Experiments show that the optimized K-Means algorithm has more F-Measure and less Entropy of clustering than standard K-Means algorithm, thereby reduces the intra-dissimilarity of clusters effectively. [ABSTRACT FROM AUTHOR]
- Published
- 2005
- Full Text
- View/download PDF