1. On the Complexity of Exact Pattern Matching in Graphs: Binary Strings and Bounded Degree
- Author
-
Equi, Massimo, Grossi, Roberto, M��kinen, Veli, Helsingin yliopisto = Helsingfors universitet = University of Helsinki, Dipartimento di Informatica [Pisa] (DI), University of Pisa - Università di Pisa, Equipe de recherche européenne en algorithmique et biologie formelle et expérimentale (ERABLE), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), The first two authors are grateful to Alessio Conte and Luca Versari for providing theircomments on the reduction. The last author wishes to thank the participants of the annualresearch group retreat on sparking the idea to studysethreductions in this context. Wethank the anonymous reviewers of an earlier version of this paper for useful suggestionsfor improving the presentation and for pointing out the connection to regular expression matching., and Sagot, Marie-France
- Subjects
FOS: Computer and information sciences ,J.3 ,F.1 ,string matching ,E.1 ,F.2.2 ,G.2.2 ,H.2.3 ,H.2.8 ,H.3.3 ,Computational Complexity (cs.CC) ,[INFO] Computer Science [cs] ,heterogeneous networks ,2012 ACM Subject Classification Mathematics of computing → Graph algorithms Theory of computation → Problems ,Computer Science - Data Structures and Algorithms ,Data Structures and Algorithms (cs.DS) ,[INFO]Computer Science [cs] ,ComputingMilieux_MISCELLANEOUS ,strong exponential time hypothesis ,graph search ,variation graphs ,labeled graphs ,exact pattern matching ,Computer Science - Computational Complexity ,string search ,graph query - Abstract
Exact pattern matching in labeled graphs is the problem of searching paths of a graph $G=(V,E)$ that spell the same string as the pattern $P[1..m]$. This basic problem can be found at the heart of more complex operations on variation graphs in computational biology, of query operations in graph databases, and of analysis operations in heterogeneous networks, where the nodes of some paths must match a sequence of labels or types. We describe a simple conditional lower bound that, for any constant $\epsilon>0$, an $O(|E|^{1 - \epsilon} \, m)$-time or an $O(|E| \, m^{1 - \epsilon})$-time algorithm for exact pattern matching on graphs, with node labels and patterns drawn from a binary alphabet, cannot be achieved unless the Strong Exponential Time Hypothesis (SETH) is false. The result holds even if restricted to undirected graphs of maximum degree three or directed acyclic graphs of maximum sum of indegree and outdegree three. Although a conditional lower bound of this kind can be somehow derived from previous results (Backurs and Indyk, FOCS'16), we give a direct reduction from SETH for dissemination purposes, as the result might interest researchers from several areas, such as computational biology, graph database, and graph mining, as mentioned before. Indeed, as approximate pattern matching on graphs can be solved in $O(|E|\,m)$ time, exact and approximate matching are thus equally hard (quadratic time) on graphs under the SETH assumption. In comparison, the same problems restricted to strings have linear time vs quadratic time solutions, respectively, where the latter ones have a matching SETH lower bound on computing the edit distance of two strings (Backurs and Indyk, STOC'15)., Comment: Using Lemma 12 and Lemma 13 might to be enough to prove Lemma 14. However, the proof of Lemma 14 is correct if you assume that the graph used in the reduction is a DAG. Hence, since the problem is already quadratic for a DAG and a binary alphabet, it has to be quadratic also for a general graph and a binary alphabet
- Published
- 2019