1. DiscoPG
- Author
-
Angela Bonifati, Stefania Dumbrava, Emile Martinez, Fatemeh Ghasemi, Malo Jaffré, Pacôme Luton, Thomas Pickles, Université Claude Bernard Lyon 1 - Faculté des sciences (UCBL FS), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon, Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL), Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS), Méthodes et modèles pour les réseaux (METHODES-SAMOVAR), Services répartis, Architectures, MOdélisation, Validation, Administration des Réseaux (SAMOVAR), Institut Mines-Télécom [Paris] (IMT)-Télécom SudParis (TSP)-Institut Mines-Télécom [Paris] (IMT)-Télécom SudParis (TSP), Institut Polytechnique de Paris (IP Paris), Ecole Nationale Supérieure d'Informatique pour l'Industrie et l'Entreprise (ENSIIE), and École normale supérieure de Lyon (ENS de Lyon)
- Subjects
Graph databases ,[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB] ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,Noeuds ,General Engineering ,ACM: H.: Information Systems/H.2: DATABASE MANAGEMENT ,Property graphs ,Graph applications - Abstract
Property graphs are becoming pervasive in a variety of graph processing applications using interconnected data. They allow to encode multi-labeled nodes and edges, as well as their properties, represented as key/value pairs. Although property graphs are widely used in several open-source and commercial graph databases, they lack a schema definition, unlike their relational counterparts. The property graph schema discovery problem consists of extracting the underlying schema concepts and types from such graph datasets. We showcase DiscoPG, a system for efficiently and accurately discovering and exploring property graph schemas. To this end, it leverages hierarchical clustering using a Gaussian Mixture Model, which accounts for both node labels and properties. DiscoPG allows users to perform schema discovery for both static and dynamic graph datasets. Suitable visualization layouts and dedicated dashboards enable the user perception of the static and dynamic inferred schema on the node clusters, as well as the differences in runtimes and clustering quality. To the best of our knowledge, DiscoPG is the first system to tackle the property graph schema discovery problem. As such, it supports the insightful exploration of the graph schema components and their evolving behavior, while revealing the underpinnings of the clustering-based discovery process.
- Published
- 2022
- Full Text
- View/download PDF