Back to Search Start Over

BDD-BASED ANALYSIS OF GAPPED q-GRAM FILTERS

Authors :
Stefan Burkhardt
Marc Fontaine
Juha Kärkkäinen
Source :
International Journal of Foundations of Computer Science. 16:1121-1134
Publication Year :
2005
Publisher :
World Scientific Pub Co Pte Lt, 2005.

Abstract

Recently, there has been a surge of interest in gapped q-gram filters for approximate string matching. Important design parameters for filters are for example the value of q, the filter-threshold and in particular the shape (aka seed) of the filter. A good choice of parameters can improve the performance of a q-gram filter by orders of magnitude and optimizing these parameters is a nontrivial combinatorial problem. We describe a new method for analyzing gapped q-gram filters. This method is simple and generic. It applies to a variety of filters, overcomes many restrictions that are present in existing algorithms and can easily be extended to new filter variants. To implement our approach, we use an extended version of BDDs (Binary Decision Diagrams), a data structure that efficiently represents sets of bit-strings. In a second step, we define a new class of multi-shape filters and analyze these filters with the BDD-based approach. Experiments show that multi-shape filters can outperform the best single-shape filters, which are currently in use, in many aspects. The BDD-based algorithm is crucial for the design and analysis of these new and better multi-shape filters. Our results apply to the k-mismatches problem, i.e. approximate string matching with Hamming distance.

Details

ISSN :
17936373 and 01290541
Volume :
16
Database :
OpenAIRE
Journal :
International Journal of Foundations of Computer Science
Accession number :
edsair.doi...........dbe55923ce949a4bf28db634f27393c4