16 results on '"Graph Indexing"'
Search Results
2. MSQ-Index: A Succinct Index for Fast Graph Similarity Search.
- Author
-
Chen, Xiaoyang, Huo, Hongwei, Huan, Jun, Vitter, Jeffrey Scott, Zheng, Weiguo, and Zou, Lei
- Subjects
MOLECULAR graphs ,PATTERN recognition systems ,DATA mining ,SOURCE code ,CHEMICAL structure ,DATA structures - Abstract
Graph similarity search under the graph edit distance constraint has received considerable attention in many applications, such as bioinformatics, data mining, pattern recognition and social networks. Existing methods for this problem have limited scalability because of the huge amount of memory they consume when handling very large graph databases with tens of millions of graphs. In this article, we present a succinct index that incorporates succinct data structures and hybrid encoding to achieve improved query time performance with minimal space usage. Specifically, the space usage of our index requires only 5–15 percent of the previous state-of-the-art indexing size while at the same time achieving several times acceleration in query time on the tested data. We also improve the query performance by augmenting the global filter with range searching, which allows us to perform similarity search in a reduced region. In addition, we propose two effective lower bounds together with a boosting technique to obtain the smallest possible candidate set. Extensive experiments demonstrate that our proposed approach is superior both in space and filtering to the state-of-the-art approaches. To the best of our knowledge, our index is the first in-memory index for this problem that successfully scales to cope with the large dataset of 25 million chemical structure graphs from the PubChem dataset. The source code is available online. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
3. GRAPES-DD: exploiting decision diagrams for index-driven search in biological graph databases.
- Author
-
Licheri, Nicola, Bonnici, Vincenzo, Beccuti, Marco, and Giugno, Rosalba
- Subjects
BIOLOGICAL databases ,CHARTS, diagrams, etc. ,ELECTRIC power filters ,MORPHOLOGY ,DATA structures ,PATTERN matching ,RDF (Document markup language) - Abstract
Background: Graphs are mathematical structures widely used for expressing relationships among elements when representing biomedical and biological information. On top of these representations, several analyses are performed. A common task is the search of one substructure within one graph, called target. The problem is referred to as one-to-one subgraph search, and it is known to be NP-complete. Heuristics and indexing techniques can be applied to facilitate the search. Indexing techniques are also exploited in the context of searching in a collection of target graphs, referred to as one-to-many subgraph problem. Filter-and-verification methods that use indexing approaches provide a fast pruning of target graphs or parts of them that do not contain the query. The expensive verification phase is then performed only on the subset of promising targets. Indexing strategies extract graph features at a sufficient granularity level for performing a powerful filtering step. Features are memorized in data structures allowing an efficient access. Indexing size, querying time and filtering power are key points for the development of efficient subgraph searching solutions. Results: An existing approach, GRAPES, has been shown to have good performance in terms of speed-up for both one-to-one and one-to-many cases. However, it suffers in the size of the built index. For this reason, we propose GRAPES-DD, a modified version of GRAPES in which the indexing structure has been replaced with a Decision Diagram. Decision Diagrams are a broad class of data structures widely used to encode and manipulate functions efficiently. Experiments on biomedical structures and synthetic graphs have confirmed our expectation showing that GRAPES-DD has substantially reduced the memory utilization compared to GRAPES without worsening the searching time. Conclusion: The use of Decision Diagrams for searching in biochemical and biological graphs is completely new and potentially promising thanks to their ability to encode compactly sets by exploiting their structure and regularity, and to manipulate entire sets of elements at once, instead of exploring each single element explicitly. Search strategies based on Decision Diagram makes the indexing for biochemical graphs, and not only, more affordable allowing us to potentially deal with huge and ever growing collections of biochemical and biological structures. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
4. Answering Reachability Queries on Incrementally Updated Graphs by Hierarchical Labeling Schema.
- Author
-
Wong, Tak-Lam
- Subjects
DIRECTED acyclic graphs ,GRAPH theory - Abstract
We proposed a novel solution schema called the Hierarchical Labeling Schema (HLS) to answer reachability queries in directed graphs. Different from many existing approaches that focus on static directed acyclic graphs (DAGs), our schema focuses on directed cyclic graphs (DCGs) where vertices or arcs could be added to a graph incrementally. Unlike many of the traditional approaches, HLS does not require the graph to be acyclic in constructing its index. Therefore, it could, in fact, be applied to both DAGs and DCGs. When vertices or arcs are added to a graph, the HLS is capable of updating the index incrementally instead of re-computing the index from the scratch each time, making it more efficient than many other approaches in the practice. The basic idea of HLS is to create a tree for each vertex in a graph and link the trees together so that whenever two vertices are given, we can immediately know whether there is a path between them by referring to the appropriate trees. We conducted extensive experiments on both real-world datasets and synthesized datasets. We compared the performance of HLS, in terms of index construction time, query processing time and space consumption, with two state-of-the-art methodologies, the path-tree method and the 3-hop method. We also conducted simulations to model the situation when a graph is updated incrementally. The performance comparison of different algorithms against HLS on static graphs has also been studied. Our results show that HLS is highly competitive in the practice and is particularly useful in the cases where the graphs are updated frequently. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
5. Degree Reduction in Labeled Graph Retrieval.
- Author
-
Ullmann, Julian R.
- Subjects
CONSTRAINT satisfaction ,SUBGRAPHS - Abstract
Within a given collection of graphs, a graph retrieval system may seek all graphs that contain a given graph, or may instead seek all graphs that are contained within a given graph. Although subgraph isomorphism is worst-case exponential, it may be average-case polynomial if graphs are labeled so as to restrict possible correspondences between vertices of included and includer graphs. Degree reduction is a procedure that uses logical inference to preclude some such correspondences, thereby substantially increasing the size of includer graphs that can be processed, without preventing any existent isomorphism from being found. Degree reduction works only with labeled graphs, which may be directed or undirected, with or without edge labels. Inexact or approximate isomorphism is accommodated by reducing strictness of conditions for perfect isomorphism. Disk-based degree reduction, which is an order of magnitude slower than memory-based degree reduction, has successfully processed graphs that have millions of vertices. Although the principle of degree reduction is simple and fundamental, its efficient practical implementation involves intricate procedural detail. Its average-case complexity analysis is currently intractable, so cost-benefit assessment has to be experimental. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
6. Degree Reduction in Labeled Graph Retrieval.
- Author
-
Ullmann, Julian R.
- Subjects
COMPLETE graphs ,ISOMORPHISM (Mathematics) ,COST effectiveness - Abstract
Within a given collection of graphs, a graph retrieval system may seek all graphs that contain a given graph, or may instead seek all graphs that are contained within a given graph. Although subgraph isomorphism is worst-case exponential, it may be average-case polynomial if graphs are labeled so as to restrict possible correspondences between vertices of included and includer graphs. Degree reduction is a procedure that uses logical inference to preclude some such correspondences, thereby substantially increasing the size of includer graphs that can be processed, without preventing any existent isomorphism from being found. Degree reduction works only with labeled graphs, which may be directed or undirected, with or without edge labels. Inexact or approximate isomorphism is accommodated by reducing strictness of conditions for perfect isomorphism. Disk-based degree reduction, which is an order of magnitude slower than memory-based degree reduction, has successfully processed graphs that have millions of vertices. Although the principle of degree reduction is simple and fundamental, its efficient practical implementation involves intricate procedural detail. Its average-case complexity analysis is currently intractable, so cost-benefit assessment has to be experimental. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
7. Degree Reduction in Labeled Graph Retrieval.
- Author
-
Ullmann, Julian R.
- Subjects
GRAPH theory ,ISOMORPHISM (Mathematics) ,POLYNOMIALS ,APPROXIMATION theory ,COST effectiveness - Abstract
Within a given collection of graphs, a graph retrieval system may seek all graphs that contain a given graph, or may instead seek all graphs that are contained within a given graph. Although subgraph isomorphism is worst-case exponential, it may be average-case polynomial if graphs are labeled so as to restrict possible correspondences between vertices of included and includer graphs. Degree reduction is a procedure that uses logical inference to preclude some such correspondences, thereby substantially increasing the size of includer graphs that can be processed, without preventing any existent isomorphism from being found. Degree reduction works only with labeled graphs, which may be directed or undirected, with or without edge labels. Inexact or approximate isomorphism is accommodated by reducing strictness of conditions for perfect isomorphism. Disk-based degree reduction, which is an order of magnitude slower than memory-based degree reduction, has successfully processed graphs that have millions of vertices. Although the principle of degree reduction is simple and fundamental, its efficient practical implementation involves intricate procedural detail. Its average-case complexity analysis is currently intractable, so cost-benefit assessment has to be experimental. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
8. DSI: A Method for Indexing Large Graphs Using Distance Set.
- Author
-
Kou, Yubo, Li, Yukun, and Meng, Xiaofeng
- Abstract
Recent years we have witnessed a great increase in modeling data as large graphs in multiple domains, such as XML, the semantic web, social network. In these circumstances, researchers are interested in querying the large graph like that: Given a large graph G, and a query Q, we report all the matches of Q in G. Since subgraph isomorphism checking is proved to be NP-Complete[1], it is infeasible to scan the whole large graph for answers, especially when the query΄s size is also large. Hence, the ″filter-verification″ approach is widely adopted. In this approach, researchers first index the neighborhood of each vertex in the large graph, then filter vertexes , and finally perform subgraph matching algorithms. Previous techniques mainly focus on efficient matching algorithms, paying little attention to indexing techniques. However, appropriate indexing techniques could help improve the efficiency of query response by generating less candidates. In this paper we investigate indexing techniques on large graphs, and propose an index structure DSI(Distance Set Index) to capture the neighborhood of each vertex. Through our distance set index, more vertexes could be pruned, resulting in a much smaller search space. Then a subgraph matching algorithm is performed in the search space. We have applied our index structure to real datasets and synthetic datasets. Extensive experiments demonstrate the efficiency and effectiveness of our indexing technique. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
9. Graph Indexing and Retrieval Based on Median Graphs.
- Author
-
Serratosa, Francesc, Solé-Ribalta, Albert, and Vidiella, Enric
- Abstract
M-trees are used to organize and define fast queries on large databases of Attributed Graphs. In classical schemes based on metric trees, the routing information stored in a routing tree node is a selected Attributed Graph from the sub-cluster the node represents. Depending on the sub-cluster and the application, it is difficult to select a good representative of the sub-cluster. To that aim, we propose to use Generalized Median Graphs as the main information kept in the routing nodes of the m-tree. Experimental validation shows that in database queries, the decrease of the nodes explored in the m-tree while using a Generalized Median Graph is about 20% respect using a selected Attributed Graph. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
10. Efficient processing of $$k$$ -hop reachability queries.
- Author
-
Cheng, James, Shang, Zechao, Cheng, Hong, Wang, Haixun, and Yu, Jeffrey
- Abstract
We study the problem of answering k -hop reachability queries in a directed graph, i.e., whether there exists a directed path of length $$k$$ , from a source query vertex to a target query vertex in the input graph. The problem of $$k$$ -hop reachability is a general problem of the classic reachability (where $$k=\infty $$ ). Existing indexes for processing classic reachability queries, as well as for processing shortest path distance queries, are not applicable or not efficient for processing $$k$$ -hop reachability queries. We propose an efficient index for processing $$k$$ -hop reachability queries. Our experimental results on a wide range of real datasets show that our method is efficient and scalable in terms of both index construction and query processing. [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
11. Incremental Maintenance of the Minimum Bisimulation of Cyclic Graphs.
- Author
-
Deng, Jintian, Choi, Byron, Xu, Jianliang, Hu, Haibo, and Bhowmick, Sourav S.
- Subjects
COMPUTER maintenance & repair ,BISIMULATION ,GRAPH theory ,APPLICATION software ,SEMANTIC Web ,ALGORITHMS - Abstract
There have been numerous recent applications of graph databases (e.g., the Semantic Web, ontology representation, social networks, XML, chemical databases, and biological databases). A fundamental structural index for data graphs, namely minimum bisimulation, has been reported useful for efficient path query processing and optimization including selectivity estimation, among many others. Data graphs are subject to change and their indexes are updated accordingly. This paper studies the incremental maintenance problem of the minimum bisimulation of a possibly cyclic data graph. While cyclic graphs are ubiquitous among the data on the web, previous work on the maintenance problem has mostly focused on acyclic graphs. To study the problem with cyclic graphs, we first show that the two existing classes of minimization algorithms—merging algorithm and partition refinement—have their strengths and weaknesses. Second, we propose a novel hybrid algorithm and its analytical model. This algorithm supports an edge insertion or deletion and two forms of batch insertions or deletions. To the best of our knowledge, this is the first maintenance algorithm that guarantees minimum bisimulation of cyclic graphs. Third, we propose to partially reuse the minimum bisimulation before an update in order to optimize maintenance performance. We present an experimental study on both synthetic and real-data graphs that verified the efficiency and effectiveness of our algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
12. Fast graph query processing with a low-cost index.
- Author
-
Cheng, James, Ke, Yiping, Fu, Ada, and Yu, Jeffrey
- Abstract
This paper studies the problem of processing supergraph queries, that is, given a database containing a set of graphs, find all the graphs in the database of which the query graph is a supergraph. Existing works usually construct an index and performs a filtering-and-verification process, which still requires many subgraph isomorphism testings. There are also significant overheads in both index construction and maintenance. In this paper, we design a graph querying system that achieves both fast indexing and efficient query processing. The index is constructed by a simple but fast method of extracting the commonality among the graphs, which does not involve any costly operation such as graph mining. Our query processing has two key techniques, direct inclusion and filtering. Direct inclusion allows partial query answers to be included directly without candidate verification. Our filtering technique further reduces the candidate set by operating on a much smaller projected database. Experimental results show that our method is significantly more efficient than the existing works in both indexing and query processing, and our index has a low maintenance cost. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
13. Path-Tree: An Efficient Reachability Indexing Scheme for Large Directed Graphs.
- Author
-
Rouming Jin, Ning Ruan, Yang Xiang, and Haixun Wang
- Subjects
INDEXING ,GRAPHIC methods ,QUERY (Information retrieval system) ,GRAPH labelings ,ALGORITHMS ,PATH analysis (Statistics) - Abstract
Reachability query is one of the fundamental queries in graph database. The main idea behind answering reachability queries is to assign vertices with certain labels such that the reachability between any two vertices can be determined by the labeling information. Though several approaches have been proposes for building these reachability labels, it remains open issues on how to handle increasingly large number of vertices in real-world graphs, and how to find the best tradeoff among the labeling size, the query answering time, and the construction time. In this article, we introduce a novel graph structure, referred to as pathtree, to help labeling very large graphs. The path-tree cover is a spanning subgraph of G in a tree shape. We show path-tree can be generalized to chain-tree which theoretically can has smaller labeling cost. On top of path-tree and chain-tree index, we also introduce a new compression scheme which groups vertices with similar labels together to further reduce the labeling size. In addition, we also propose an efficient incremental update algorithm for dynamic index maintenance. Finally, we demonstrate both analytically and empirically the effectiveness and efficiency of our new approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
14. Efficient algorithms for supergraph query processing on graph databases.
- Author
-
Shuo Zhang, Xiaofeng Gao, Weili Wu, Jianzhong Li, and Hong Gao
- Abstract
We study the problem of processing supergraph queries on graph databases. A graph database D is a large set of graphs. A supergraph query q on D is to retrieve all the graphs in D such that q is a supergraph of them. The large number of graphs in databases and the NP-completeness of subgraph isomorphism testing make it challenging to efficiently processing supergraph queries. In this paper, a new approach to processing supergraph queries is proposed. Specifically, a method for compactly organizing graph databases is first presented. Common subgraphs of the graphs in a database are stored only once in the compact organization of the database, in order to reduce the overall cost of subgraph isomorphism testings from the stored graphs to queries during query processing. Then, an exact algorithm and an approximate algorithm for generating the significant feature set with optimal order are proposed, followed by the algorithms for indices construction on graph databases. The optimal order on the feature set is to reduce the number of subgraph isomorphism testings during query processing. Based on the compact organization of graph databases, a novel algorithm for testing subgraph isomorphisms from multiple graphs to one graph is presented. Finally, based on all the above techniques, a query processing method is proposed. Analytical and experimental results show that the proposed algorithms outperform the existing similar algorithms by one to two orders of magnitude. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
15. Optimized Distributed Subgraph Matching Algorithm Based on Partition Replication.
- Author
-
Yuan, Ling, Bin, Jiali, and Pan, Peng
- Subjects
PARALLEL algorithms ,GRAPH algorithms ,DISTRIBUTED algorithms ,DATA warehousing ,QUERY (Information retrieval system) ,ALGORITHMS ,ERROR-correcting codes - Abstract
At present, with the explosive growth of data scale, subgraph matching for massive graph data is difficult to satisfy with efficiency. Meanwhile, the graph index used in existing subgraph matching algorithm is difficult to update and maintain when facing dynamic graphs. We propose a distributed subgraph matching algorithm based on Partition Replica (noted as PR-Match) to process the partition and storage of large-scale data graphs. The PR-Match algorithm first splits the query graph into sub-queries, then assigns the sub-query to each node for sub-graph matching, and finally merges the matching results. In the PR-Match algorithm, we propose a heuristic rule based on prediction cost to select the optimal merging plan, which greatly reduces the cost of merging. In order to accelerate the matching speed of the sub-query graph, a vertex code based on the vertex neighbor label signature is proposed, which greatly reduces the search space for the subquery. As the vertex code is based on the increment, the problem that the feature-based graph index is difficult to maintain in the face of the dynamic graph is solved. An abundance of experiments on real and synthetic datasets demonstrate the high efficiency and strong scalability of the PR-Match algorithm when handling large-scale data graphs. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
16. Efficiently Indexing Large Sparse Graphs for Similarity Search.
- Author
-
Wang, Guoren, Wang, Bin, Yang, Xiaochun, and Yu, Ge
- Subjects
PROTEIN-protein interactions ,GRAPHIC methods ,GRAPH theory ,NATURAL language processing ,ISOMORPHISM (Mathematics) ,DIRECTED acyclic graphs ,SEARCH algorithms - Abstract
The graph structure is a very important means to model schemaless data with complicated structures, such as protein-protein interaction networks, chemical compounds, knowledge query inferring systems, and road networks. This paper focuses on the index structure for similarity search on a set of large sparse graphs and proposes an efficient indexing mechanism by introducing the Q-Gram idea. By decomposing graphs to small grams (organized by κ-Adjacent Tree patterns) and pairing-up on those κ-Adjacent Tree patterns, the lower bound estimation of their edit distance can be calculated for candidate filtering. Furthermore, we have developed a series of techniques for inverted index construction and online query processing. By building the candidate set for the query graph before the exact edit distance calculation, the number of graphs need to proceed into exact matching can be greatly reduced. Extensive experiments on real and synthetic data sets have been conducted to show the effectiveness and efficiency of the proposed indexing mechanism. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.