101 results on '"Graph Indexing"'
Search Results
2. The Ring: Worst-case Optimal Joins in Graph Databases using (Almost) No Extra Space.
- Author
-
Arroyuelo, Diego, Gómez-Brandón, Adrián, Hogan, Aidan, Navarro, Gonzalo, Reutter, Juan, Rojas-Ledesma, Javiel, and Soto, Adrián
- Subjects
- *
DIRECTED graphs , *GRAPH algorithms , *RELATIONAL databases , *TIME complexity , *COMPUTER science , *DATA structures , *DATA security failures - Published
- 2024
- Full Text
- View/download PDF
3. An Experimental Evaluation of Summarisation-Based Frequent Subgraph Mining for Subgraph Searching
- Author
-
Wangmo, Chimi and Wiese, Lena
- Published
- 2024
- Full Text
- View/download PDF
4. GRAPES-DD: exploiting decision diagrams for index-driven search in biological graph databases
- Author
-
Nicola Licheri, Vincenzo Bonnici, Marco Beccuti, and Rosalba Giugno
- Subjects
Query processing ,Pattern matching ,Subgraph isomorphism ,Graph indexing ,Decision diagrams ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Graphs are mathematical structures widely used for expressing relationships among elements when representing biomedical and biological information. On top of these representations, several analyses are performed. A common task is the search of one substructure within one graph, called target. The problem is referred to as one-to-one subgraph search, and it is known to be NP-complete. Heuristics and indexing techniques can be applied to facilitate the search. Indexing techniques are also exploited in the context of searching in a collection of target graphs, referred to as one-to-many subgraph problem. Filter-and-verification methods that use indexing approaches provide a fast pruning of target graphs or parts of them that do not contain the query. The expensive verification phase is then performed only on the subset of promising targets. Indexing strategies extract graph features at a sufficient granularity level for performing a powerful filtering step. Features are memorized in data structures allowing an efficient access. Indexing size, querying time and filtering power are key points for the development of efficient subgraph searching solutions. Results An existing approach, GRAPES, has been shown to have good performance in terms of speed-up for both one-to-one and one-to-many cases. However, it suffers in the size of the built index. For this reason, we propose GRAPES-DD, a modified version of GRAPES in which the indexing structure has been replaced with a Decision Diagram. Decision Diagrams are a broad class of data structures widely used to encode and manipulate functions efficiently. Experiments on biomedical structures and synthetic graphs have confirmed our expectation showing that GRAPES-DD has substantially reduced the memory utilization compared to GRAPES without worsening the searching time. Conclusion The use of Decision Diagrams for searching in biochemical and biological graphs is completely new and potentially promising thanks to their ability to encode compactly sets by exploiting their structure and regularity, and to manipulate entire sets of elements at once, instead of exploring each single element explicitly. Search strategies based on Decision Diagram makes the indexing for biochemical graphs, and not only, more affordable allowing us to potentially deal with huge and ever growing collections of biochemical and biological structures.
- Published
- 2021
- Full Text
- View/download PDF
5. The Rational Construction of a Wheeler DFA
- Author
-
Inenaga, S, Puglisi, SJ, Manzini, G, Policriti, A, Prezza, N, Riccardi, B, Manzini G., Policriti A., Prezza N., Riccardi B., Inenaga, S, Puglisi, SJ, Manzini, G, Policriti, A, Prezza, N, Riccardi, B, Manzini G., Policriti A., Prezza N., and Riccardi B.
- Abstract
Deterministic Finite Wheeler Automata are a natural generalisation to regular languages of the theory of compressed data structures originated by the introduction of the Burrows-Wheeler transform. Indeed, if we can find a Wheeler automaton recognizing a given language L, such automaton can be used to design time and space efficient algorithms for representing and searching L. In this paper we introduce an alternative representation of Deterministic Wheeler Automata by showing that a natural map between strings and rational numbers in Qr0, 1q can be extended to represent the automaton’s states as intervals in Qr0, 1q. Using this representation it emerges a natural relationship between automata properties and some properties of real numbers. In addition, such representation enables us to formulate problems related to automata in a numerical setting. Although at the moment the numerical approach does not lead to time efficient algorithms, we believe this new perspective deserves further consideration. As a further demonstration of the convenience of this new representation, we use it to provide a simple proof of an unexpected result on regular languages. More precisely, we compare the size of the smallest Wheeler automaton recognizing a given language L with respect to the size of the smallest automaton, possibly non-Wheeler, recognizing the same language. We show settings in which there can be an exponential gap between the two sizes, and we discuss the implications of this result on the problem of representing regular languages.
- Published
- 2024
6. Worst-Case-Optimal Similarity Joins on Graph Databases
- Author
-
Arroyuelo, Diego, Bustos, Benjamin, Gómez-Brandón, Adrián, Hogan, Aidan, Navarro, Gonzalo, Reutter, Juan, Arroyuelo, Diego, Bustos, Benjamin, Gómez-Brandón, Adrián, Hogan, Aidan, Navarro, Gonzalo, and Reutter, Juan
- Abstract
[Absctract]: We extend the concept of worst-case optimal equijoins in graph databases to the case where some nodes are required to be within the k-nearest neighbors (kNN) of others under some similarity function. We model the problem by superimposing the database graph with the kNN graph and show that a variant of Leapfrog TrieJoin (LTJ) implemented over a compact data structure called the Ring can be seamlessly extended to integrate similarity clauses with the equijoins in the LTJ query process, retaining worst-case optimality in many relevant cases. Our experiments on a benchmark that combines Wikidata and IMGpedia show that our enhanced LTJ algorithm outperforms by a considerable margin a baseline that first applies classic LTJ and then completes the query by applying the similarity predicates. The difference is more pronounced on queries where the similarity clauses are more densely connected to the query, becoming of an order of magnitude in some cases.
- Published
- 2024
7. Improved Graph Indexing Algorithms for Label-Constrained Reachability Queries.
- Author
-
Chakraborty, Sankardeep, Najafi, Mohammad, and Satti, Srinivasa Rao
- Subjects
SOCIAL networks ,CHARTS, diagrams, etc. ,DATA analysis ,JAVA programming language ,REFUSE collection - Abstract
Nowadays graph data have become absolutely ubiquitous in various applications starting from social/road networks to bio-medical data etc. Given such graph data, a reachability query asks if there exists a path from a source vertex to a target vertex in the graph. Due to its immense implications in both theory and applied domains, this query and many of its variants have been extensively studied in the literature. One such variant investigates the reachability between two vertices in an edge-labeled graph while constraining the label set simultaneously. This problem has recently been addressed by Valstar et al. [SIGMOD'17] who proposed an approach called the landmark indexing (LI) to support faster label-constrained reachability (LCR) queries. In this work, we introduce a simple, practical and space-efficient solution for answering LCR queries even faster. The experimental evaluation shows signiffcant time and space efficiency beneffts of our proposed solution over the LI approach for this problem in both real-world and synthetic graphs. [ABSTRACT FROM AUTHOR]
- Published
- 2021
8. MSQ-Index: A Succinct Index for Fast Graph Similarity Search.
- Author
-
Chen, Xiaoyang, Huo, Hongwei, Huan, Jun, Vitter, Jeffrey Scott, Zheng, Weiguo, and Zou, Lei
- Subjects
- *
MOLECULAR graphs , *PATTERN recognition systems , *DATA mining , *SOURCE code , *CHEMICAL structure , *DATA structures - Abstract
Graph similarity search under the graph edit distance constraint has received considerable attention in many applications, such as bioinformatics, data mining, pattern recognition and social networks. Existing methods for this problem have limited scalability because of the huge amount of memory they consume when handling very large graph databases with tens of millions of graphs. In this article, we present a succinct index that incorporates succinct data structures and hybrid encoding to achieve improved query time performance with minimal space usage. Specifically, the space usage of our index requires only 5–15 percent of the previous state-of-the-art indexing size while at the same time achieving several times acceleration in query time on the tested data. We also improve the query performance by augmenting the global filter with range searching, which allows us to perform similarity search in a reduced region. In addition, we propose two effective lower bounds together with a boosting technique to obtain the smallest possible candidate set. Extensive experiments demonstrate that our proposed approach is superior both in space and filtering to the state-of-the-art approaches. To the best of our knowledge, our index is the first in-memory index for this problem that successfully scales to cope with the large dataset of 25 million chemical structure graphs from the PubChem dataset. The source code is available online. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
9. GRAPES-DD: exploiting decision diagrams for index-driven search in biological graph databases.
- Author
-
Licheri, Nicola, Bonnici, Vincenzo, Beccuti, Marco, and Giugno, Rosalba
- Subjects
BIOLOGICAL databases ,CHARTS, diagrams, etc. ,ELECTRIC power filters ,MORPHOLOGY ,DATA structures ,PATTERN matching ,RDF (Document markup language) - Abstract
Background: Graphs are mathematical structures widely used for expressing relationships among elements when representing biomedical and biological information. On top of these representations, several analyses are performed. A common task is the search of one substructure within one graph, called target. The problem is referred to as one-to-one subgraph search, and it is known to be NP-complete. Heuristics and indexing techniques can be applied to facilitate the search. Indexing techniques are also exploited in the context of searching in a collection of target graphs, referred to as one-to-many subgraph problem. Filter-and-verification methods that use indexing approaches provide a fast pruning of target graphs or parts of them that do not contain the query. The expensive verification phase is then performed only on the subset of promising targets. Indexing strategies extract graph features at a sufficient granularity level for performing a powerful filtering step. Features are memorized in data structures allowing an efficient access. Indexing size, querying time and filtering power are key points for the development of efficient subgraph searching solutions. Results: An existing approach, GRAPES, has been shown to have good performance in terms of speed-up for both one-to-one and one-to-many cases. However, it suffers in the size of the built index. For this reason, we propose GRAPES-DD, a modified version of GRAPES in which the indexing structure has been replaced with a Decision Diagram. Decision Diagrams are a broad class of data structures widely used to encode and manipulate functions efficiently. Experiments on biomedical structures and synthetic graphs have confirmed our expectation showing that GRAPES-DD has substantially reduced the memory utilization compared to GRAPES without worsening the searching time. Conclusion: The use of Decision Diagrams for searching in biochemical and biological graphs is completely new and potentially promising thanks to their ability to encode compactly sets by exploiting their structure and regularity, and to manipulate entire sets of elements at once, instead of exploring each single element explicitly. Search strategies based on Decision Diagram makes the indexing for biochemical graphs, and not only, more affordable allowing us to potentially deal with huge and ever growing collections of biochemical and biological structures. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
10. Reachability in big graphs: A distributed indexing and querying approach.
- Author
-
Hocine, Imane, Yahiaoui, Saïd, Bendjoudi, Ahcene, and Nouali-Taboudjemat, Nadia
- Subjects
- *
PARALLEL processing , *DISTRIBUTED computing , *SUBGRAPHS - Abstract
The advent of Big graphs characterized by their enormous number of nodes, with multiple edges between them makes the existing reachability query indexing approaches unable to guarantee a reasonable time for both the index construction and query steps. Therefore a novel approach that takes into account these new characteristics during the graph processing is needed. In this paper, we propose an Overlay Graph-based Distributed Reachability Indexing approach (ODRI), an indexing scheme through which the index construction and reachability query are processed in a parallel and distributed manner. The key idea of ODRI is to process a Big graph as a set of smaller subgraphs (partitions) interconnected to each other through an overlay graph. In this way, the partitions can be indexed in parallel and, at the same time, the reachability information can also be extracted. Hence, the index construction and query processing time will be reduced significantly. Therefore, ODRI ensures the scalability of Big graphs, which is a challenge for the existing reachability approaches. Besides, we formally prove that this strategy preserves the reachability properties. Using real-life data, we experimentally verify that our approach outperforms the state-of-the-art methods, and is scalable in terms of the number of partitions, regardless of how graphs are distributed. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
11. Efficient access methods for very large distributed graph databases.
- Author
-
Luaces, David, Viqueira, José R.R., Cotos, José M., and Flores, Julián C.
- Subjects
- *
DISTRIBUTED databases , *DATABASES - Abstract
• Indexing techniques are essential in large scale subgraph searching. • Three new indexing techniques proposed, which leverage the use of bitmaps. • Generic framework for filter-then-verify implementations on top of Apache Spark. • Evaluation shows that different indexes are suitable for different query selectivities. • A distributed approach is essential for very large databases and low selective queries. Subgraph searching is an essential problem in graph databases, but it is also challenging due to the involved subgraph isomorphism NP-Complete sub-problem. Filter-Then-Verify (FTV) methods mitigate performance overheads by using an index to prune out graphs that do not fit the query in a filtering stage, reducing the number of subgraph isomorphism evaluations in a subsequent verification stage. Subgraph searching has to be applied to very large databases (tens of millions of graphs) in real applications such as molecular substructure searching. Previous surveys have identified the FTV solutions GraphGrepSX (GGSX) and CT-Index as the best ones for large databases (thousands of graphs), however they cannot reach reasonable performance on very large ones (tens of millions graphs). This paper proposes a generic approach for the distributed implementation of FTV solutions. Besides, three previous methods that improve the performance of GGSX and CT-Index are adapted to be executed in clusters. The evaluation shows how the achieved solutions provide a great performance improvement (between 70% and 90% of filtering time reduction) in a centralized configuration and how they may be used to achieve efficient subgraph searching over very large databases in cluster configurations. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
12. A Minimal Rare Substructures-Based Model for Graph Database Indexing
- Author
-
Azaouzi, Mehdi, Ben Romdhane, Lotfi, Madureira, Ana Maria, editor, Abraham, Ajith, editor, Gamboa, Dorabela, editor, and Novais, Paulo, editor
- Published
- 2017
- Full Text
- View/download PDF
13. Modified MinG Algorithm to Find Top-K Shortest Paths from large RDF Graphs
- Author
-
Hassan, Zohaib, Qadir, Mohammad Abdul, Islam, Muhammad Arshad, Shahzad, Umer, Akhter, Nadeem, Diniz Junqueira Barbosa, Simone, Series editor, Chen, Phoebe, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kara, Orhun, Series editor, Kotenko, Igor, Series editor, Liu, Ting, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Sack, Harald, editor, Dietze, Stefan, editor, Tordai, Anna, editor, and Lange, Christoph, editor
- Published
- 2016
- Full Text
- View/download PDF
14. Efficient query over large datasets of analytical chemistry
- Author
-
Ríos Viqueira, José Ramón, Fernández Pena, Tomás, Universidade de Santiago de Compostela. Escola de Doutoramento Internacional (EDIUS), Universidade de Santiago de Compostela. Programa de Doutoramento en Investigación en Tecnoloxías da Información, Luaces Cachaza, David, Ríos Viqueira, José Ramón, Fernández Pena, Tomás, Universidade de Santiago de Compostela. Escola de Doutoramento Internacional (EDIUS), Universidade de Santiago de Compostela. Programa de Doutoramento en Investigación en Tecnoloxías da Información, and Luaces Cachaza, David
- Abstract
The efficient management of molecular data is one of the most demanded technologies by the industry. A very important type of search is the substructure searching. The molecular structures may be encoded as graphs where the vertices and bonds represent the atoms and bonds, respectively. In this Thesis, a cutting edge system that enables the storage and querying of molecular data has been designed and implemented, paying attention to the molecular substructure search, where new filter-then-verify(FTV) methods, beyond the state-of-the-art, were designed, implemented, and tested, achieving performance gains over 75% in the filtering stage. A generic framework for the implementation of FTV techniques on a distributed architecture was also developed, enabling the application of the FTV methods on very large graph databases, achieving a great performance gain in both index building and query execution. Finally, the Thesis presents a study for the use of different FTV solutions to obtain approximate results in an interactive searching application.
- Published
- 2023
15. Faster Prefix-Sorting Algorithms for Deterministic Finite Automata
- Author
-
Sung-Hwan Kim and Francisco Olivares and Nicola Prezza, Kim, Sung-Hwan, Olivares, Francisco, Prezza, Nicola, Sung-Hwan Kim and Francisco Olivares and Nicola Prezza, Kim, Sung-Hwan, Olivares, Francisco, and Prezza, Nicola
- Abstract
Sorting is a fundamental algorithmic pre-processing technique which often allows to represent data more compactly and, at the same time, speeds up search queries on it. In this paper, we focus on the well-studied problem of sorting and indexing string sets. Since the introduction of suffix trees in 1973, dozens of suffix sorting algorithms have been described in the literature. In 2017, these techniques were extended to sets of strings described by means of finite automata: the theory of Wheeler graphs [Gagie et al., TCS'17] introduced automata whose states can be totally-sorted according to the co-lexicographic (co-lex in the following) order of the prefixes of words accepted by the automaton. More recently, in [Cotumaccio, Prezza, SODA'21] it was shown how to extend these ideas to arbitrary automata by means of partial co-lex orders. This work showed that a co-lex order of minimum width (thus optimizing search query times) on deterministic finite automata (DFAs) can be computed in O(m² + n^{5/2}) time, m being the number of transitions and n the number of states of the input DFA. In this paper, we exhibit new combinatorial properties of the minimum-width co-lex order of DFAs and exploit them to design faster prefix sorting algorithms. In particular, we describe two algorithms sorting arbitrary DFAs in O(mn) and O(n² log n) time, respectively, and an algorithm sorting acyclic DFAs in O(m log n) time. Within these running times, all algorithms compute also a smallest chain partition of the partial order (required to index the DFA). We present an experiment result to show that an optimized implementation of the O(n² log n)-time algorithm exhibits a nearly-linear behaviour on large deterministic pan-genomic graphs and is thus also of practical interest.
- Published
- 2023
- Full Text
- View/download PDF
16. Large-Scale Graph Indexing Using Binary Embeddings of Node Contexts
- Author
-
Riba, Pau, Lladós, Josep, Fornés, Alicia, Dutta, Anjan, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Liu, Cheng-Lin, editor, Luo, Bin, editor, Kropatsch, Walter G., editor, and Cheng, Jian, editor
- Published
- 2015
- Full Text
- View/download PDF
17. Distributed K-Distance Indexing Approach for Efficient Shortest Path Discovery on Large Graphs
- Author
-
Hong, Jihye, Kim, Hyunwook, Nawaz, Waqas, Park, Kisung, Jeong, Byeong-Soo, Lee, Young-Koo, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Han, Wook-Shin, editor, Lee, Mong Li, editor, Muliantara, Agus, editor, Sanjaya, Ngurah Agus, editor, Thalheim, Bernhard, editor, and Zhou, Shuigeng, editor
- Published
- 2014
- Full Text
- View/download PDF
18. Graph Database Retrieval Based on Metric-Trees
- Author
-
Serratosa, Francesc, Cortés, Xavier, Solé-Ribalta, Albert, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Gimel’farb, Georgy, editor, Hancock, Edwin, editor, Imiya, Atsushi, editor, Kuijper, Arjan, editor, Kudo, Mineichi, editor, Omachi, Shinichiro, editor, Windeatt, Terry, editor, and Yamada, Keiji, editor
- Published
- 2012
- Full Text
- View/download PDF
19. Indexing Simple Graphs by Means of the Resistance Distance
- Author
-
Chatchawit Aporntewan, Prabhas Chongstitvatana, and Nachol Chaiyaratana
- Subjects
Electrical resistance ,graph indexing ,graph isomorphism ,resistance distance ,simple connected graphs ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
For every simple connected graph, we present a polynomial time algorithm for computing a numerical index, which is composed of primary and secondary parts. Given a graph G = (V, E) where V and E are, respectively, vertex and edge sets, the primary part of the index is a set of |V | fractions and the secondary part of the index is a set of |B| x |V| fractions, where B is the partition of the vertex set V. Basically, each fraction in the primary and secondary parts is the electrical resistance between two vertices when every edge in the graph is replaced with a unit resistor (1 Ω). The experimental results show that our indexing algorithm produced a unique index for every simple connected graph with ≤10 vertices, including all graphs that are counterexamples for detecting graph isomorphism by resistance spectrum comparison. The strength of our indexing algorithm lies in its extreme simplicity. An index of a graph is solely derived from the determinants of reduced Laplacian matrices, which represent the graph. Therefore, the performance of our indexing algorithm only depends on how fast the matrix determinants can be computed.
- Published
- 2016
- Full Text
- View/download PDF
20. Business Process Model Retrieval Based on Graph Indexing Method
- Author
-
Rivas, Daniel Felipe, Corchuelo, David S., Figueroa, Cristhian, Corrales, Juan Carlos, Giugno, Rosalba, van der Aalst, Will, Series editor, Mylopoulos, John, Series editor, Sadeh, Norman M., Series editor, Shaw, Michael J., Series editor, Szyperski, Clemens, Series editor, zur Muehlen, Michael, editor, and Su, Jianwen, editor
- Published
- 2011
- Full Text
- View/download PDF
21. K-nn Queries in Graph Databases Using M-Trees
- Author
-
Serratosa, Francesc, Solé-Ribalta, Albert, Cortés, Xavier, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Real, Pedro, editor, Diaz-Pernil, Daniel, editor, Molina-Abril, Helena, editor, Berciano, Ainhoa, editor, and Kropatsch, Walter, editor
- Published
- 2011
- Full Text
- View/download PDF
22. Graph Indexing and Retrieval Based on Median Graphs
- Author
-
Serratosa, Francesc, Solé-Ribalta, Albert, Vidiella, Enric, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Martínez-Trinidad, José Francisco, editor, and Carrasco-Ochoa, Jesús Ariel, editor
- Published
- 2010
- Full Text
- View/download PDF
23. DSI: A Method for Indexing Large Graphs Using Distance Set
- Author
-
Kou, Yubo, Li, Yukun, Meng, Xiaofeng, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Doug, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Chen, Lei, editor, Tang, Changjie, editor, Yang, Jun, editor, and Gao, Yunjun, editor
- Published
- 2010
- Full Text
- View/download PDF
24. Subgraph spotting in graph representations of comic book images.
- Author
-
Le, Thanh Nam, Luqman, Muhammad Muzzamil, Dutta, Anjan, Héroux, Pierre, Rigaud, Christophe, Guérin, Clément, Foggia, Pasquale, Burie, Jean-Christophe, Ogier, Jean-Marc, Lladós, Josep, and Adam, Sébastien
- Subjects
- *
REPRESENTATIONS of graphs , *COMIC books, strips, etc. , *IMAGE analysis , *SEARCH algorithms , *FEATURE extraction - Abstract
Highlights • Problem of lack of freely available ground-truthed datasets for subgraph spotting. • New SSGCI dataset for subgraph spotting in graph representations of comic book images. • Description of groundtruth of the new SSGCI dataset and evaluation protocol. • Results of two state-of-the-art methods of subgraph spotting on the new SSGCI dataset. • Foster and facilitate research for developing new subgraph spotting techniques and use of structural information for CBIR and QBE. Abstract Graph-based representations are the most powerful data structures for extracting, representing and preserving the structural information of underlying data. Subgraph spotting is an interesting research problem, especially for studying and investigating the structural information based content-based image retrieval (CBIR) and query by example (QBE) in image databases. In this paper we address the problem of lack of freely available ground-truthed datasets for subgraph spotting and present a new dataset for subgraph spotting in graph representations of comic book images (SSGCI) with its ground-truth and evaluation protocol. Experimental results of two state-of-the-art methods of subgraph spotting are presented on the new SSGCI dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
25. Top-K Correlation Sub-graph Search in Graph Databases
- Author
-
Zou, Lei, Chen, Lei, Lu, Yansheng, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Zhou, Xiaofang, editor, Yokota, Haruo, editor, Deng, Ke, editor, and Liu, Qing, editor
- Published
- 2009
- Full Text
- View/download PDF
26. GRAPES-DD: exploiting decision diagrams for index-driven search in biological graph databases
- Author
-
Rosalba Giugno, Vincenzo Bonnici, Marco Beccuti, and Nicola Licheri
- Subjects
Theoretical computer science ,Query processing ,Databases, Factual ,Abstracting and Indexing ,Computer science ,QH301-705.5 ,0206 medical engineering ,Subgraph isomorphism problem ,Computer applications to medicine. Medical informatics ,R858-859.7 ,02 engineering and technology ,computer.software_genre ,Biochemistry ,Decision diagrams ,03 medical and health sciences ,Structural Biology ,decision diagrams, subgraph isomorphism, graph indexing, pattern matching ,Subgraph isomorphism ,Influence diagram ,Vitis ,Pruning (decision trees) ,Biology (General) ,Pattern matching ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Graph database ,Methodology Article ,Applied Mathematics ,Search engine indexing ,Data structure ,Computer Science Applications ,Graph indexing ,Index (publishing) ,Heuristics ,computer ,Algorithms ,020602 bioinformatics - Abstract
Background Graphs are mathematical structures widely used for expressing relationships among elements when representing biomedical and biological information. On top of these representations, several analyses are performed. A common task is the search of one substructure within one graph, called target. The problem is referred to as one-to-one subgraph search, and it is known to be NP-complete. Heuristics and indexing techniques can be applied to facilitate the search. Indexing techniques are also exploited in the context of searching in a collection of target graphs, referred to as one-to-many subgraph problem. Filter-and-verification methods that use indexing approaches provide a fast pruning of target graphs or parts of them that do not contain the query. The expensive verification phase is then performed only on the subset of promising targets. Indexing strategies extract graph features at a sufficient granularity level for performing a powerful filtering step. Features are memorized in data structures allowing an efficient access. Indexing size, querying time and filtering power are key points for the development of efficient subgraph searching solutions. Results An existing approach, GRAPES, has been shown to have good performance in terms of speed-up for both one-to-one and one-to-many cases. However, it suffers in the size of the built index. For this reason, we propose GRAPES-DD, a modified version of GRAPES in which the indexing structure has been replaced with a Decision Diagram. Decision Diagrams are a broad class of data structures widely used to encode and manipulate functions efficiently. Experiments on biomedical structures and synthetic graphs have confirmed our expectation showing that GRAPES-DD has substantially reduced the memory utilization compared to GRAPES without worsening the searching time. Conclusion The use of Decision Diagrams for searching in biochemical and biological graphs is completely new and potentially promising thanks to their ability to encode compactly sets by exploiting their structure and regularity, and to manipulate entire sets of elements at once, instead of exploring each single element explicitly. Search strategies based on Decision Diagram makes the indexing for biochemical graphs, and not only, more affordable allowing us to potentially deal with huge and ever growing collections of biochemical and biological structures.
- Published
- 2021
27. Fast processing of graph queries on a large database of small and medium-sized data graphs.
- Author
-
Pal, Dipali, Rao, Praveen, Slavov, Vasil, and Katib, Anas
- Subjects
- *
GRAPH theory , *QUERYING (Computer science) , *DATABASES , *SUBGRAPHS , *ISOMORPHISM (Mathematics) , *SEARCH algorithms - Abstract
We propose a new way of indexing a large database of small and medium-sized graphs and processing exact subgraph matching (or subgraph isomorphism) and approximate (full) graph matching queries. Rather than decomposing a graph into smaller units ( e.g. , paths, trees, graphs) for indexing purposes, we represent each graph in the database by its graph signature, which is essentially a multiset. We construct a disk-based index on all the signatures via bulk loading. During query processing, a query graph is also mapped into its signature, and this signature is searched using the index by performing multiset operations. To improve the precision of exact subgraph matching, we develop a new scheme using the concept of line graphs . Through extensive evaluation on real and synthetic graph datasets, we demonstrate that our approach provides a scalable and efficient disk-based solution for a large database of small and medium-sized graphs. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
28. Answering Reachability Queries on Incrementally Updated Graphs by Hierarchical Labeling Schema.
- Author
-
Wong, Tak-Lam
- Subjects
DIRECTED acyclic graphs ,GRAPH theory - Abstract
We proposed a novel solution schema called the Hierarchical Labeling Schema (HLS) to answer reachability queries in directed graphs. Different from many existing approaches that focus on static directed acyclic graphs (DAGs), our schema focuses on directed cyclic graphs (DCGs) where vertices or arcs could be added to a graph incrementally. Unlike many of the traditional approaches, HLS does not require the graph to be acyclic in constructing its index. Therefore, it could, in fact, be applied to both DAGs and DCGs. When vertices or arcs are added to a graph, the HLS is capable of updating the index incrementally instead of re-computing the index from the scratch each time, making it more efficient than many other approaches in the practice. The basic idea of HLS is to create a tree for each vertex in a graph and link the trees together so that whenever two vertices are given, we can immediately know whether there is a path between them by referring to the appropriate trees. We conducted extensive experiments on both real-world datasets and synthesized datasets. We compared the performance of HLS, in terms of index construction time, query processing time and space consumption, with two state-of-the-art methodologies, the path-tree method and the 3-hop method. We also conducted simulations to model the situation when a graph is updated incrementally. The performance comparison of different algorithms against HLS on static graphs has also been studied. Our results show that HLS is highly competitive in the practice and is particularly useful in the cases where the graphs are updated frequently. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
29. Efficient access methods for very large distributed graph databases
- Author
-
Universidade de Santiago de Compostela. Centro de Investigación en Tecnoloxías da Información, Universidade de Santiago de Compostela. Departamento de Electrónica e Computación, Luaces Cachaza, David, Ríos Viqueira, José Ramón, Cotos Yáñez, José Manuel, Flores González, Julián Carlos, Universidade de Santiago de Compostela. Centro de Investigación en Tecnoloxías da Información, Universidade de Santiago de Compostela. Departamento de Electrónica e Computación, Luaces Cachaza, David, Ríos Viqueira, José Ramón, Cotos Yáñez, José Manuel, and Flores González, Julián Carlos
- Abstract
Subgraph searching is an essential problem in graph databases, but it is also challenging due to the involved subgraph isomorphism NP-Complete sub-problem. Filter-Then-Verify (FTV) methods mitigate performance overheads by using an index to prune out graphs that do not fit the query in a filtering stage, reducing the number of subgraph isomorphism evaluations in a subsequent verification stage. Subgraph searching has to be applied to very large databases (tens of millions of graphs) in real applications such as molecular substructure searching. Previous surveys have identified the FTV solutions GraphGrepSX (GGSX) and CT-Index as the best ones for large databases (thousands of graphs), however they cannot reach reasonable performance on very large ones (tens of millions graphs). This paper proposes a generic approach for the distributed implementation of FTV solutions. Besides, three previous methods that improve the performance of GGSX and CT-Index are adapted to be executed in clusters. The evaluation shows how the achieved solutions provide a great performance improvement (between 70% and 90% of filtering time reduction) in a centralized configuration and how they may be used to achieve efficient subgraph searching over very large databases in cluster configurations
- Published
- 2021
30. Tree decomposition-based indexing for efficient shortest path and nearest neighbors query answering on graphs.
- Author
-
Wei-Kleiner, Fang
- Subjects
- *
K-nearest neighbor classification , *PROBLEM solving , *QUERYING (Computer science) , *TREE graphs , *ALGORITHMS , *INDEXING - Abstract
We propose TEDI, an indexing for solving shortest path, and k Nearest Neighbors (kNN) problems. TEDI is based on the tree decomposition methodology. The graph is first decomposed into a tree in which the node contains vertices. The shortest paths are stored in such nodes. These local shortest paths together with the tree structure constitute the index of the graph. Based on this index, algorithms can be executed to solve the shortest path, as well as the kNN problem more efficiently. Our experimental results show that TEDI offers orders-of-magnitude performance improvement over existing approaches on the index construction time, the index size and the query answering. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
31. Degree Reduction in Labeled Graph Retrieval.
- Author
-
Ullmann, Julian R.
- Subjects
CONSTRAINT satisfaction ,SUBGRAPHS - Abstract
Within a given collection of graphs, a graph retrieval system may seek all graphs that contain a given graph, or may instead seek all graphs that are contained within a given graph. Although subgraph isomorphism is worst-case exponential, it may be average-case polynomial if graphs are labeled so as to restrict possible correspondences between vertices of included and includer graphs. Degree reduction is a procedure that uses logical inference to preclude some such correspondences, thereby substantially increasing the size of includer graphs that can be processed, without preventing any existent isomorphism from being found. Degree reduction works only with labeled graphs, which may be directed or undirected, with or without edge labels. Inexact or approximate isomorphism is accommodated by reducing strictness of conditions for perfect isomorphism. Disk-based degree reduction, which is an order of magnitude slower than memory-based degree reduction, has successfully processed graphs that have millions of vertices. Although the principle of degree reduction is simple and fundamental, its efficient practical implementation involves intricate procedural detail. Its average-case complexity analysis is currently intractable, so cost-benefit assessment has to be experimental. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
32. iTri: Index-based triangle listing in massive graphs.
- Author
-
Rasel, Mostofa Kamal, Han, Yongkoo, Kim, Jinseung, Park, Kisung, Tu, Nguyen Anh, and Lee, Young-Koo
- Subjects
- *
GRAPH theory , *PROBLEM solving , *ALGORITHMS , *DATA mining , *PARALLEL computers - Abstract
Triangle listing is a basic operator when dealing with many graph problems. However, in-memory algorithms do not work well with recently developed massive graphs such as social networks because these graphs cannot be accommodated in the memory. Thus, external memory-based algorithms have been proposed recently, but these approaches still require frequent multiple scans of the whole graph on the disk and large volumes of calculations are performed that involve the whole graph during every iteration. In this study, we propose a novel index-based method for listing triangles in massive graphs. First, we present new notions for the vertex range index and potential cone vertex index. Next, we propose an index join-based triangle listing algorithm. Our method accesses the indexed data asynchronously and joins them to list triangles using a multi-threaded parallel processing technique. Based on experiments, we demonstrate that our algorithm outperforms the state-of-the-art solution methods by three to eight times in terms of the wall clock time. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
33. Efficient processing of $$k$$ -hop reachability queries.
- Author
-
Cheng, James, Shang, Zechao, Cheng, Hong, Wang, Haixun, and Yu, Jeffrey
- Abstract
We study the problem of answering k -hop reachability queries in a directed graph, i.e., whether there exists a directed path of length $$k$$ , from a source query vertex to a target query vertex in the input graph. The problem of $$k$$ -hop reachability is a general problem of the classic reachability (where $$k=\infty $$ ). Existing indexes for processing classic reachability queries, as well as for processing shortest path distance queries, are not applicable or not efficient for processing $$k$$ -hop reachability queries. We propose an efficient index for processing $$k$$ -hop reachability queries. Our experimental results on a wide range of real datasets show that our method is efficient and scalable in terms of both index construction and query processing. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
34. Incremental Maintenance of the Minimum Bisimulation of Cyclic Graphs.
- Author
-
Deng, Jintian, Choi, Byron, Xu, Jianliang, Hu, Haibo, and Bhowmick, Sourav S.
- Subjects
- *
COMPUTER maintenance & repair , *BISIMULATION , *GRAPH theory , *APPLICATION software , *SEMANTIC Web , *ALGORITHMS - Abstract
There have been numerous recent applications of graph databases (e.g., the Semantic Web, ontology representation, social networks, XML, chemical databases, and biological databases). A fundamental structural index for data graphs, namely minimum bisimulation, has been reported useful for efficient path query processing and optimization including selectivity estimation, among many others. Data graphs are subject to change and their indexes are updated accordingly. This paper studies the incremental maintenance problem of the minimum bisimulation of a possibly cyclic data graph. While cyclic graphs are ubiquitous among the data on the web, previous work on the maintenance problem has mostly focused on acyclic graphs. To study the problem with cyclic graphs, we first show that the two existing classes of minimization algorithms—merging algorithm and partition refinement—have their strengths and weaknesses. Second, we propose a novel hybrid algorithm and its analytical model. This algorithm supports an edge insertion or deletion and two forms of batch insertions or deletions. To the best of our knowledge, this is the first maintenance algorithm that guarantees minimum bisimulation of cyclic graphs. Third, we propose to partially reuse the minimum bisimulation before an update in order to optimize maintenance performance. We present an experimental study on both synthetic and real-data graphs that verified the efficiency and effectiveness of our algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
35. Component retrieval based on a database of graphs for Hand-Written Electronic-Scheme Digitalisation
- Author
-
Serratosa, Francesc, Cortés, Xavier, and Solé-Ribalta, Albert
- Subjects
- *
INFORMATION retrieval , *DATABASES , *GRAPH theory , *DATA encryption , *COMPUTER-aided design software ,WRITING - Abstract
Abstract: Some companies use Hand-Written Electronic-Schemes or typed electronic-schemes but without the original digital format. Some times, it would be useful to have the digitalized format of these schemes to update some of their parts or design new ones through CAD programs. In this paper, we do not explain the whole electronic-scheme analysis application but we deep on the database module. This module is crucial if we want a high recognition ratio in a reasonable run-time. We represent the electronic components by graphs and we evaluate, on one hand, different structures useful to encode a set of graphs, and on the other hand, indexing structures for fast querying of these graphs. [Copyright &y& Elsevier]
- Published
- 2013
- Full Text
- View/download PDF
36. Hypergraph-based image retrieval for graph-based representation
- Author
-
Jouili, Salim and Tabbone, Salvatore
- Subjects
- *
HYPERGRAPHS , *IMAGE retrieval , *GRAPH theory , *MATHEMATICAL models , *DATA analysis , *DATABASES - Abstract
Abstract: In this paper, we introduce a novel method for graph indexing. We propose a hypergraph-based model for graph data sets by allowing cluster overlapping. More precisely, in this representation one graph can be assigned to more than one cluster. Using the concept of the graph median and a given threshold, the proposed algorithm detects automatically the number of classes in the graph database. We consider clusters as hyperedges in our hypergraph model and we index the graph set by the hyperedge centroids. This model is interesting to traverse the data set and efficient to retrieve graphs. [Copyright &y& Elsevier]
- Published
- 2012
- Full Text
- View/download PDF
37. W-TSV: Weighted topological signature vector for lexicon reduction in handwritten Arabic documents
- Author
-
Chherawala, Youssouf and Cheriet, Mohamed
- Subjects
- *
DIGITAL signatures , *LEXICON , *GRAPH theory , *CODING theory , *DIRECTED acyclic graphs , *GEOMETRIC analysis , *VECTOR spaces - Abstract
Abstract: This paper proposes a holistic lexicon-reduction method for ancient and modern handwritten Arabic documents. The word shape is represented by the weighted topological signature vector (W-TSV), which encodes graph data into a low-dimensional vector space. Three directed acyclic graph (DAG) representations are proposed for Arabic word shapes, based on topological and geometrical features. Lexicon reduction is achieved by a nearest neighbors search in the W-TSV space. The proposed framework has been tested on the IFN/ENIT and the Ibn Sina databases, achieving respectively a degree of reduction of 83.5% and 92.9% for an accuracy of reduction of 90%. [Copyright &y& Elsevier]
- Published
- 2012
- Full Text
- View/download PDF
38. Structure and attribute index for approximate graph matching in large graphs
- Author
-
Zhu, Linhong, Keong Ng, Wee, and Cheng, James
- Subjects
- *
GRAPH theory , *GRAPH algorithms , *ONLINE social networks , *REPRESENTATIONS of graphs , *MATCHING theory , *DATA analysis - Abstract
Abstract: The increasing popularity of graph data in various domains has lead to a renewed interest in developing efficient graph matching techniques, especially for processing large graphs. In this paper, we study the problem of approximate graph matching in a large attributed graph. Given a large attributed graph and a query graph, we compute a subgraph of the large graph that best matches the query graph. We propose a novel structure-aware and attribute-aware index to process approximate graph matching in a large attributed graph. We first construct an index on the similarity of the attributed graph, by partitioning the large search space into smaller subgraphs based on structure similarity and attribute similarity. Then, we construct a connectivity-based index to give a concise representation of inter-partition connections. We use the index to find a set of best matching paths. From these best matching paths, we compute the best matching answer graph using a greedy algorithm. Experimental results on real datasets demonstrate the efficiency of both index construction and query processing. We also show that our approach attains high-quality query answers. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
39. Fast graph query processing with a low-cost index.
- Author
-
Cheng, James, Ke, Yiping, Fu, Ada, and Yu, Jeffrey
- Abstract
This paper studies the problem of processing supergraph queries, that is, given a database containing a set of graphs, find all the graphs in the database of which the query graph is a supergraph. Existing works usually construct an index and performs a filtering-and-verification process, which still requires many subgraph isomorphism testings. There are also significant overheads in both index construction and maintenance. In this paper, we design a graph querying system that achieves both fast indexing and efficient query processing. The index is constructed by a simple but fast method of extracting the commonality among the graphs, which does not involve any costly operation such as graph mining. Our query processing has two key techniques, direct inclusion and filtering. Direct inclusion allows partial query answers to be included directly without candidate verification. Our filtering technique further reduces the candidate set by operating on a much smaller projected database. Experimental results show that our method is significantly more efficient than the existing works in both indexing and query processing, and our index has a low maintenance cost. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
40. Path-Tree: An Efficient Reachability Indexing Scheme for Large Directed Graphs.
- Author
-
Rouming Jin, Ning Ruan, Yang Xiang, and Haixun Wang
- Subjects
- *
INDEXING , *GRAPHIC methods , *QUERY (Information retrieval system) , *GRAPH labelings , *ALGORITHMS , *PATH analysis (Statistics) - Abstract
Reachability query is one of the fundamental queries in graph database. The main idea behind answering reachability queries is to assign vertices with certain labels such that the reachability between any two vertices can be determined by the labeling information. Though several approaches have been proposes for building these reachability labels, it remains open issues on how to handle increasingly large number of vertices in real-world graphs, and how to find the best tradeoff among the labeling size, the query answering time, and the construction time. In this article, we introduce a novel graph structure, referred to as pathtree, to help labeling very large graphs. The path-tree cover is a spanning subgraph of G in a tree shape. We show path-tree can be generalized to chain-tree which theoretically can has smaller labeling cost. On top of path-tree and chain-tree index, we also introduce a new compression scheme which groups vertices with similar labels together to further reduce the labeling size. In addition, we also propose an efficient incremental update algorithm for dynamic index maintenance. Finally, we demonstrate both analytically and empirically the effectiveness and efficiency of our new approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
41. Efficient algorithms for supergraph query processing on graph databases.
- Author
-
Shuo Zhang, Xiaofeng Gao, Weili Wu, Jianzhong Li, and Hong Gao
- Abstract
We study the problem of processing supergraph queries on graph databases. A graph database D is a large set of graphs. A supergraph query q on D is to retrieve all the graphs in D such that q is a supergraph of them. The large number of graphs in databases and the NP-completeness of subgraph isomorphism testing make it challenging to efficiently processing supergraph queries. In this paper, a new approach to processing supergraph queries is proposed. Specifically, a method for compactly organizing graph databases is first presented. Common subgraphs of the graphs in a database are stored only once in the compact organization of the database, in order to reduce the overall cost of subgraph isomorphism testings from the stored graphs to queries during query processing. Then, an exact algorithm and an approximate algorithm for generating the significant feature set with optimal order are proposed, followed by the algorithms for indices construction on graph databases. The optimal order on the feature set is to reduce the number of subgraph isomorphism testings during query processing. Based on the compact organization of graph databases, a novel algorithm for testing subgraph isomorphisms from multiple graphs to one graph is presented. Finally, based on all the above techniques, a query processing method is proposed. Analytical and experimental results show that the proposed algorithms outperform the existing similar algorithms by one to two orders of magnitude. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
42. Wheeler Languages
- Author
-
Alanko, J, D'Agostino, G, Policriti, A, and Prezza, N
- Subjects
FOS: Computer and information sciences ,Formal Languages and Automata Theory (cs.FL) ,Regular languages ,Finite automata ,Burrows-Wheeler transform ,Wheeler languages ,Graph indexing ,Computer Science - Formal Languages and Automata Theory ,0102 computer and information sciences ,Data_CODINGANDINFORMATIONTHEORY ,01 natural sciences ,Theoretical Computer Science ,03 medical and health sciences ,General Relativity and Quantum Cosmology ,Regular languagesFinite automataBurrows-Wheeler transformWheeler languagesGraph indexing ,Data_FILES ,Computer Science::Data Structures and Algorithms ,030304 developmental biology ,0303 health sciences ,Settore INF/01 - Informatica ,Computer Science Applications ,Computational Theory and Mathematics ,010201 computation theory & mathematics ,ComputerSystemsOrganization_MISCELLANEOUS ,Information Systems - Abstract
The recently introduced class of Wheeler graphs, inspired by the Burrows-Wheeler Transform (BWT) of a given string, admits an efficient index data structure for searching for subpaths with a given path label, and lifts the applicability of the Burrows-Wheeler transform from strings to languages. In this paper we study the regular languages accepted by automata having a Wheeler graph as transition function, and prove results on determination, Myhill_Nerode characterization, decidability, and closure properties for this class of languages.
- Published
- 2020
43. Distilling Structure from Imagery : Graph-based Models for the Interpretation of Document Images
- Author
-
Riba, Pau
- Subjects
Computer engineering. Computer hardware ,Graph-based Representations ,Computer Vision ,QA75.5-76.95 ,Pattern Recognition ,Hierarchical Graphs ,TK7885-7895 ,Document Analysis ,Electronic computers. Computer science ,Graph Indexing ,Graph Edit Distance ,Document analysis ,Computer vision ,Structural pattern recognition ,Computer Vision and Pattern Recognition ,Graph Neural Networks ,Graph Embeddings ,Software ,Structural Pattern Recognition ,Table Detection - Abstract
From its early stages, the community of Pattern Recognition and Computer Vision has considered the importance of leveraging the structural information when understanding images. Usually, graphs have been proposed as a suitable model to represent this kind of information due to their flexibility and representational power able to codify both, the components, objects, or entities and their pairwise relationship. Even though graphs have been successfully applied to a huge variety of tasks, as a result of their symbolic and relational nature, graphs have always suffered from some limitations compared to statistical approaches. Indeed, some trivial mathematical operations do not have an equivalence in the graph domain. For instance, in the core of many pattern recognition applications, there is a need to compare two objects. This operation, which is trivial when considering feature vectors defined in ℝn, is not properly defined for graphs. In this thesis, we have investigated the importance of the structural information from two perspectives, the traditional graph-based methods and the new advances on Geometric Deep Learning. On the one hand, we explore the problem of defining a graph representation and how to deal with it on a large scale and noisy scenario. On the other hand, Graph Neural Networks are proposed to first redefine a Graph Edit Distance methodologies as a metric learning problem, and second, to apply them in a real use case scenario for the detection of repetitive patterns which define tables in invoice documents. As experimental framework, we have validated the different methodological contributions in the domain of Document Image Analysis and Recognition.
- Published
- 2020
44. Optimized Distributed Subgraph Matching Algorithm Based on Partition Replication
- Author
-
Peng Pan, Jiali Bin, and Ling Yuan
- Subjects
Computer Networks and Communications ,Computer science ,graph partition ,lcsh:TK7800-8360 ,02 engineering and technology ,distributed computing ,020204 information systems ,subgraph matching ,0202 electrical engineering, electronic engineering, information engineering ,Partition (number theory) ,Electrical and Electronic Engineering ,Blossom algorithm ,020203 distributed computing ,Heuristic ,lcsh:Electronics ,Graph partition ,Rule-based system ,Partition (database) ,Graph ,Vertex (geometry) ,Hardware and Architecture ,Control and Systems Engineering ,Signal Processing ,graph indexing ,Graph indexing ,Algorithm ,MathematicsofComputing_DISCRETEMATHEMATICS - Abstract
At present, with the explosive growth of data scale, subgraph matching for massive graph data is difficult to satisfy with efficiency. Meanwhile, the graph index used in existing subgraph matching algorithm is difficult to update and maintain when facing dynamic graphs. We propose a distributed subgraph matching algorithm based on Partition Replica (noted as PR-Match) to process the partition and storage of large-scale data graphs. The PR-Match algorithm first splits the query graph into sub-queries, then assigns the sub-query to each node for sub-graph matching, and finally merges the matching results. In the PR-Match algorithm, we propose a heuristic rule based on prediction cost to select the optimal merging plan, which greatly reduces the cost of merging. In order to accelerate the matching speed of the sub-query graph, a vertex code based on the vertex neighbor label signature is proposed, which greatly reduces the search space for the subquery. As the vertex code is based on the increment, the problem that the feature-based graph index is difficult to maintain in the face of the dynamic graph is solved. An abundance of experiments on real and synthetic datasets demonstrate the high efficiency and strong scalability of the PR-Match algorithm when handling large-scale data graphs.
- Published
- 2020
45. Efficient search in graph databases using cross filtering.
- Author
-
Chun-Hee Lee and Chin-Wan Chung
- Subjects
- *
GRAPH theory , *DATABASES , *SEARCH algorithms , *ONLINE social networks , *BIOINFORMATICS , *TREE graphs , *PROBLEM solving - Abstract
Recently, graph data has been increasingly used in many areas such as bio-informatics and social networks, and a large amount of graph data is generated in those areas. As such, we need to manage such data efficiently. A basic, common problem in graph-related applications is to find graph data that contains a query (Graph Query Problem). However, since examining graph data sequentially incurs a prohibitive cost due to subgraph isomorphism testing, a novel indexing scheme is needed. A feature-based approach is generally used as a graph indexing scheme. A path structure, a tree structure, or a graph structure can be extracted from a graph database as a feature. The path feature and the tree feature can be easily managed, but have lower pruning power than the graph feature. Although the graph feature has the best pruning power, it takes too much time to match the graph feature with the query. In this paper, we propose a graph feature-based approach called a CF-Framework (Cross Filtering-Framework) to solve the graph query problem efficiently. To select the graph features that correspond to the query with a low cost, the CF-Framework makes two feature groups according to the query and filters out each group crossly (i.e., alternately) based on set properties. We then validate the efficiency of the CF-Framework through experimental results. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
46. Wheeler languages.
- Author
-
Alanko, Jarno, D'Agostino, Giovanna, Policriti, Alberto, and Prezza, Nicola
- Subjects
- *
FOREIGN language education , *LANGUAGE & languages , *FINITE state machines , *MACHINE theory - Abstract
The recently introduced class of Wheeler graphs, inspired by the Burrows-Wheeler Transform (BWT) of a given string, admits an efficient index data structure for searching for subpaths with a given path label, and lifts the applicability of the Burrows-Wheeler Transform from a single string to an entire language. In this paper we study the regular languages accepted by automata having a Wheeler graph as transition function and prove results on determinization, Myhill-Nerode characterization, decidability, and closure properties for this class of languages. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
47. Extended high dimensional indexing approach for reachability queries on very large graphs.
- Author
-
da Silva, Rodrigo Ferreira, Urrutia, Sebastián, and Hvattum, Lars Magnus
- Subjects
- *
DIRECTED acyclic graphs , *HIGH-dimensional model representation , *GRAPH connectivity , *SETUP time , *COMPUTER systems , *ACYCLIC model - Abstract
• We propose a strategy to improve FELINE's negative cut index. • We introduce LYNX, a novel indexing approach for reachability queries. • Computational results compare LYNX's performance with state-of-the-art algorithms. Given a directed acyclic graph G = (V , A) and two vertices u , v ∈ V , the reachability problem is to answer if there is a path from u to v in the graph. In the context of very large graphs, with millions of vertices and a series of queries to be answered, it is not practical to search the graph for each query. On the other hand, the storage of the full transitive closure of the graph is also impractical due to its O (| V | 2) size. Scalable approaches aim to create indices used to prune the search during its execution. Negative indices may be able to determine (in constant time) that a query has a negative answer while positive indices may determine (again in constant time) that a query has a positive answer. In this paper we propose a novel scalable approach called LYNX that uses a large number of topological sorts of G as a negative cut index without degrading the query time. A similar strategy is applied regarding a positive cut index. In addition, LYNX proposes a user-defined index size that enables the user to control the ratio between negative and positive cuts depending on the expected query pattern. We show by computational experiments that LYNX consistently outperforms the state-of-the-art approach in terms of query-time using the same index-size for graphs with high reachability ratio. In intelligent computer systems that rely on frequent tests of connectivity in graphs, LYNX can reduce the time delay experience by end users through a reduced query time. This comes at the expense of an increased setup time whenever the underlying graph is updated. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
48. Indexing Simple Graphs by Means of the Resistance Distance
- Author
-
Prabhas Chongstitvatana, Nachol Chaiyaratana, and Chatchawit Aporntewan
- Subjects
General Computer Science ,010103 numerical & computational mathematics ,02 engineering and technology ,01 natural sciences ,Distance-regular graph ,law.invention ,Combinatorics ,law ,Graph power ,Line graph ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,0101 mathematics ,Complement graph ,Mathematics ,Discrete mathematics ,graph isomorphism ,Resistance distance ,General Engineering ,Voltage graph ,simple connected graphs ,Graph bandwidth ,graph indexing ,resistance distance ,Electrical resistance ,020201 artificial intelligence & image processing ,Regular graph ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,lcsh:TK1-9971 - Abstract
For every simple connected graph, we present a polynomial time algorithm for computing a numerical index, which is composed of primary and secondary parts. Given a graph $G=(V,E)$ where $V$ and $E$ are, respectively, vertex and edge sets, the primary part of the index is a set of $\vert V \vert $ fractions and the secondary part of the index is a set of $\vert B\vert \times \vert V\vert $ fractions, where $B$ is the partition of the vertex set $V$ . Basically, each fraction in the primary and secondary parts is the electrical resistance between two vertices when every edge in the graph is replaced with a unit resistor (1 $\Omega $ ). The experimental results show that our indexing algorithm produced a unique index for every simple connected graph with ≤10 vertices, including all graphs that are counterexamples for detecting graph isomorphism by resistance spectrum comparison. The strength of our indexing algorithm lies in its extreme simplicity. An index of a graph is solely derived from the determinants of reduced Laplacian matrices, which represent the graph. Therefore, the performance of our indexing algorithm only depends on how fast the matrix determinants can be computed.
- Published
- 2016
49. Hypergraph querying using structural indexing and layer-related-closure verification
- Author
-
Turgay Korkmaz and Xinran Yu
- Subjects
0301 basic medicine ,Hypergraph ,Theoretical computer science ,Computer science ,Search engine indexing ,02 engineering and technology ,computer.software_genre ,Graph ,Human-Computer Interaction ,03 medical and health sciences ,030104 developmental biology ,Artificial Intelligence ,Hardware and Architecture ,0202 electrical engineering, electronic engineering, information engineering ,Search problem ,Graph indexing ,020201 artificial intelligence & image processing ,Data mining ,Graph isomorphism ,computer ,Software ,MathematicsofComputing_DISCRETEMATHEMATICS ,Information Systems - Abstract
Graph indexing and querying mechanisms have been receiving significant attention due to their importance in analyzing the growing graph datasets in many domains. Although much work has been done in the context of simple graphs, they are not directly applicable to hypergraphs that represent more complex relationships in various applications. The key problem here is to search a given subhypergraph query in a larger hypergraph dataset. This search problem is known to be NP-hard as it is related to graph isomorphism. To solve this search problem in an efficient manner, we first create an index set by extracting the common subhypergraph structures from the given hypergraph dataset. Upon receiving a query, we use the same indexing techniques and create a query index set from the given subhypergraph. Utilizing both indices, we identify the possible locations of the query in the hypergraph dataset. We then start the subhypergraph search to verify whether the query really appears at each location by using an accelerated verification mechanism called layer-related-closure method. Through experiments on a real hypergraph dataset and random datasets, we demonstrate the efficiency and effectiveness of hypergraph indexing and our verification method.
- Published
- 2015
- Full Text
- View/download PDF
50. VFG – INDEX: A NOVEL GRAPH INDEXING METHOD
- Author
-
K. Vivekanandan, A. Pankaj Moses Monickaraj, and D. Ramya Chithra
- Subjects
Combinatorics ,Computer science ,Graph indexing ,Graph (abstract data type) - Abstract
The latest advancements in science and technology have observed large quantity of complicated structures and schema less data such as proteins, circuits, images, Web, and XML documents which can be modeled into various types of which one of the most dominant now a days are graphs. This has caused Graph Mining, one of the budding areas of research happening throughout. In this paper, we investigate the various algorithms for Graph Indexing which makes use of Frequent Sub graph as a key term for indexing, of which one the most dominant is FG-index algorithm. This algorithm is tested with AIDS dataset and an improved algorithm is proposed for effective indexing.
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.