Back to Search
Start Over
The Exploitation of Distance Distributions for Clustering
- Publication Year :
- 2021
- Publisher :
- arXiv, 2021.
-
Abstract
- Although distance measures are used in many machine learning algorithms, the literature on the context-independent selection and evaluation of distance measures is limited in the sense that prior knowledge is used. In cluster analysis, current studies evaluate the choice of distance measure after applying unsupervised methods based on error probabilities, implicitly setting the goal of reproducing predefined partitions in data. Such studies use clusters of data that are often based on the context of the data as well as the custom goal of the specific study. Depending on the data context, different properties for distance distributions are judged to be relevant for appropriate distance selection. However, if cluster analysis is based on the task of finding similar partitions of data, then the intrapartition distances should be smaller than the interpartition distances. By systematically investigating this specification using distribution analysis through a mirrored-density plot, it is shown that multimodal distance distributions are preferable in cluster analysis. As a consequence, it is advantageous to model distance distributions with Gaussian mixtures prior to the evaluation phase of unsupervised methods. Experiments are performed on several artificial datasets and natural datasets for the task of clustering.<br />Comment: 19 pages, 6 figures
- Subjects :
- FOS: Computer and information sciences
Computer Science - Machine Learning
Data, context and interaction
I.5.2
I.5.3
business.industry
Computer science
Gaussian
Pattern recognition
Context (language use)
Measure (mathematics)
Plot (graphics)
Distance measures
Computer Science Applications
Theoretical Computer Science
Machine Learning (cs.LG)
symbols.namesake
symbols
Artificial intelligence
business
Cluster analysis
Software
Selection (genetic algorithm)
62H30
Subjects
Details
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....494cc662b8da833e5d73133ff73a258c
- Full Text :
- https://doi.org/10.48550/arxiv.2108.09649