1. The Impact of Supervised Manifold Learning on Structure Preserving and Classification Error: A Theoretical Study
- Author
-
Isakh Weheliye, Laureta Hajderanj, and Daqing Chen
- Subjects
Computer Science::Machine Learning ,General Computer Science ,structure capturing ,Computer science ,Classification error ,02 engineering and technology ,Machine learning ,computer.software_genre ,03 medical and health sciences ,manifold learning ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,Representation (mathematics) ,supervised manifold learning ,visualization ,030304 developmental biology ,0303 health sciences ,business.industry ,Dimensionality reduction ,General Engineering ,Nonlinear dimensionality reduction ,Data structure ,Manifold ,Euclidean distance ,ComputingMethodologies_PATTERNRECOGNITION ,Embedding ,020201 artificial intelligence & image processing ,Artificial intelligence ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Isomap ,business ,computer ,lcsh:TK1-9971 - Abstract
In recent years, a variety of supervised manifold learning techniques have been proposed to outperform their unsupervised alternative versions in terms of classification accuracy and data structure capturing. Some dissimilarity measures have been used in these techniques to guide the dimensionality reduction process. Their good performance was empirically demonstrated; however, the relevant analysis is still missing. This paper contributes to a theoretical analysis on a) how dissimilarity measures affect maintaining manifold neighbourhood structure and b) how supervised manifold learning techniques could contribute to the reduction of classification error. This paper also provides a cross-comparison between supervised and unsupervised manifold learning approaches in terms of structure capturing using Kendall’s Tau coefficients and co-ranking matrices. Four different metrics (including three dissimilarity measures and Euclidean distance) have been considered along with manifold learning methods such as Isomap, ${t}$ -Stochastic Neighbour Embedding ( ${t}$ -SNE), and Laplacian Eigenmaps (LE), in two datasets: Breast Cancer and Swiss-Roll. This paper concludes that although the dissimilarity measures used in the manifold learning techniques can reduce classification error, they do not learn well or preserve the structure of the hidden manifold in the high dimensional space, but instead, they destroy the structure of the data. Based on the findings of this paper, it is advisable to use supervised manifold learning techniques as a pre-processing step in classification. In addition, it is not advisable to apply supervised manifold learning for visualization purposes since the two-dimensional representation using supervised manifold learning does not improve the preservation of data structure.
- Published
- 2021