Back to Search Start Over

Automatic Recommendation of a Distance Measure for Clustering Algorithms

Authors :
Tian Zheng
Jiayin Wang
Yingbin Li
Xiaoyan Zhu
Jingwen Fu
Source :
ACM Transactions on Knowledge Discovery from Data. 15:1-22
Publication Year :
2020
Publisher :
Association for Computing Machinery (ACM), 2020.

Abstract

With a large number of distance measures, the appropriate choice for clustering a given data set with a specified clustering algorithm becomes an important problem. In this article, an automatic distance measure recommendation method for clustering algorithms is proposed. The recommendation method consists of the following steps: (1) metadata extraction, including meta-feature collection and meta-target identification; (2) recommendation model construction using metadata; and (3) distance measure recommendation for a new data set by the recommendation model. Two different types of meta-targets and meta-learning techniques are utilized considering the possible different requirements of users. To validate the necessity and effectiveness of the distance measure recommendation method, an empirical study is conducted with 199 publicly available data sets, 9 distance measures, and 2 widely used clustering algorithms. The experimental results indicate that distance measure significantly influences the performance of the clustering algorithm for a given data set. Furthermore, performance analysis of the proposed recommendation method proves its effectiveness.

Details

ISSN :
1556472X and 15564681
Volume :
15
Database :
OpenAIRE
Journal :
ACM Transactions on Knowledge Discovery from Data
Accession number :
edsair.doi...........8f9bb39cb70fd8225c6286dafa70030d