1. A new distance between multivariate clusters of varying locations, elliptical shapes, and directions.
- Author
-
Hadi, Ali S.
- Abstract
• Proposing of new method for measuring the distance between pairs of clusters in the dataset. • The proposed distance accurately captures both the variability of the cluster centers as well as the variability of shapes and directions of their respective covariance matrices • The method has a number of advantages including simplicity, interpretability, and computational efficiency • Both the classical and the robust versions of the distances are provided • The distance is illustrated by several motivating examples that demonstrate the need of the new proposed distance and applied to both real and synthetic data • Proving that the Ward distance and the Euclidian distance are equivalent. Clustering methods are based on the computations of both the distances between every pair of the n observations in a multivariate dataset as well as the distances between every pair of clusters in the dataset. The clusters can have different locations and varying elliptical shapes and directions. Numerous methods have been proposed in the literature for computing both of these two types of distances. The contributions of this paper are two folds. First, we propose a new elliptical distance between pairs of clusters in a dataset with different cluster centers and elliptical shapes and directions, Second, we proved analytically that the Ward distance and the Euclidean distance are equivalent. We propose a new classical method for computing the distance between a pair of clusters in the dataset. It is the only distance that does not assume spherical clusters. The proposed classical distances could also be made robust by replacing estimates of location and scale by their respective robust estimators. The proposed distance has a number of advantages including simplicity, interpretability, computational efficiency as well as the ability to accurately capture both the variability of the cluster centers as well as the variability of shapes and directions of their respective covariance matrices. The method is also illustrated by several motivating examples that demonstrate the need of the new proposed distance. The superiority of the proposed method is also demonstrated by application to real-life as well as challenging synthetic data. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF