1. A Topological Approach of Principal Component Analysis
- Author
-
Rafik Abdesselam, Entrepôts, Représentation et Ingénierie des Connaissances (ERIC), Université Lumière - Lyon 2 (UL2)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon, COACTIS (COACTIS), and Université Lumière - Lyon 2 (UL2)-Université Jean Monnet [Saint-Étienne] (UJM) more...
- Subjects
Structure (mathematical logic) ,graphe de voisinage ,Computer science ,Covariance matrix ,Adjacency Matrix ,MDS représentations graphiques ,Topology ,Mesure de proximité ,équivalence topologique ,Set (abstract data type) ,Data set ,[STAT]Statistics [stat] ,Correlation Matrix ,matrice d'adjacence ,Principal component analysis ,Topological Equivalence ,Data analysis ,Proximity Measure ,Adjacency matrix ,Multidimensional scaling ,Neighborhood Graph ,corrélation ,MDS Graphical Representation - Abstract
International audience; Large datasets are increasingly widespread in many disciplines. The exponential growth of data requires the development of more data analysis methods in order to process information more efficiently. In order to better visualize the data, many methods such as Principal Component Analysis (PCA) and MultiDimensional Scaling (MDS) allow to extract a low-dimensional structure from high-dimensional data set. The proposed approach, called Topological Principal Component Analysis (TPCA), is a multidimensional descriptive method witch studies a homogeneous set of continuous variables defined on the same set of individuals. It is a topological method of data analysis that consists of comparing and classifying proximity measures from among some of the most widely used proximity measures for continuous data. Proximity measures play an important role in many areas of data analysis, the results strongly depend on the proximity measure chosen. So, among the many existing measures, which one is most useful? Are they all equivalent? How to identify the one that is most appropriate to analyze the correlation structure of a set of quantitative variables. TPCA proposes an appropriate adjacency matrix associated to an unknown proximity measure according to the data under consideration, then analyzes and visualizes, with graphic representations, the relationship structure of the variables relating to, the well known PCA problem. Its uses the concept of neighborhood graphs and compares a set of proximity measures for continuous data which can be more-or-less equivalent a topological equivalence criterion between two proximity measures is defined and statistically tested according to the topological correlation between the variables considered. An example on real data illustrates the proposed approach.; L'objectif de ce papier est de proposer une approche topologique d'analyse des données qui consiste à explorer, analyser et représenter la structure des corrélation d'un ensemble de variables quantitatives dans un contexte d'analyse en composantes principales. Les mesures de similarité jouent un rôle important dans de nombreux domaines de l'analyse des données. Les résultats de toute opération de structuration, de classification ou declassement d'objets dépendent fortement de la mesure de proximité choisie. Basées sur la notion de graphes de voisinage, certaines de ces mesures de proximité sont plus ou moins équivalentes. La notion d'équivalence topologique entre deux mesures est définie et statistiquement testée selon la description des corrélations entre les variables. Un exemple sur données réelles illustre cette approche topologique. more...
- Published
- 2021
- Full Text
- View/download PDF