1. Unsupervised Feature Selection for Histogram-Valued Symbolic Data Using Hierarchical Conceptual Clustering
- Author
-
Kadri Umbleja, Manabu Ichino, and Hiroyuki Yaguchi
- Subjects
Computer science ,Conceptual clustering ,Feature selection ,02 engineering and technology ,Similarity measure ,hierarchical conceptual clustering ,01 natural sciences ,010104 statistics & probability ,multi-role measure ,Histogram ,0202 electrical engineering, electronic engineering, information engineering ,0101 mathematics ,Cluster analysis ,histogram-valued data ,visualization ,business.industry ,Statistics ,Pattern recognition ,HA1-4737 ,Data set ,ComputingMethodologies_PATTERNRECOGNITION ,Compact space ,Feature (computer vision) ,compactness ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,unsupervised feature selection - Abstract
This paper presents an unsupervised feature selection method for multi-dimensional histogram-valued data. We define a multi-role measure, called the compactness, based on the concept size of given objects and/or clusters described using a fixed number of equal probability bin-rectangles. In each step of clustering, we agglomerate objects and/or clusters so as to minimize the compactness for the generated cluster. This means that the compactness plays the role of a similarity measure between objects and/or clusters to be merged. Minimizing the compactness is equivalent to maximizing the dis-similarity of the generated cluster, i.e., concept, against the whole concept in each step. In this sense, the compactness plays the role of cluster quality. We also show that the average compactness of each feature with respect to objects and/or clusters in several clustering steps is useful as a feature effectiveness criterion. Features having small average compactness are mutually covariate and are able to detect a geometrically thin structure embedded in the given multi-dimensional histogram-valued data. We obtain thorough understandings of the given data via visualization using dendrograms and scatter diagrams with respect to the selected informative features. We illustrate the effectiveness of the proposed method by using an artificial data set and real histogram-valued data sets.
- Published
- 2021