Back to Search
Start Over
Automatic Recommendation of a Distance Measure for Clustering Algorithms
- Source :
- ACM Transactions on Knowledge Discovery from Data. 15:1-22
- Publication Year :
- 2020
- Publisher :
- Association for Computing Machinery (ACM), 2020.
-
Abstract
- With a large number of distance measures, the appropriate choice for clustering a given data set with a specified clustering algorithm becomes an important problem. In this article, an automatic distance measure recommendation method for clustering algorithms is proposed. The recommendation method consists of the following steps: (1) metadata extraction, including meta-feature collection and meta-target identification; (2) recommendation model construction using metadata; and (3) distance measure recommendation for a new data set by the recommendation model. Two different types of meta-targets and meta-learning techniques are utilized considering the possible different requirements of users. To validate the necessity and effectiveness of the distance measure recommendation method, an empirical study is conducted with 199 publicly available data sets, 9 distance measures, and 2 widely used clustering algorithms. The experimental results indicate that distance measure significantly influences the performance of the clustering algorithm for a given data set. Furthermore, performance analysis of the proposed recommendation method proves its effectiveness.
- Subjects :
- Measure (data warehouse)
General Computer Science
Meta learning (computer science)
Computer science
05 social sciences
050301 education
02 engineering and technology
computer.software_genre
Distance measures
Data set
Metadata
Identification (information)
Empirical research
0202 electrical engineering, electronic engineering, information engineering
020201 artificial intelligence & image processing
Data mining
Cluster analysis
0503 education
computer
Subjects
Details
- ISSN :
- 1556472X and 15564681
- Volume :
- 15
- Database :
- OpenAIRE
- Journal :
- ACM Transactions on Knowledge Discovery from Data
- Accession number :
- edsair.doi...........8f9bb39cb70fd8225c6286dafa70030d