Back to Search
Start Over
Fixed Block Compression Boosting in FM-Indexes: Theory and Practice
- Source :
- Algorithmica. 81:1370-1391
- Publication Year :
- 2018
- Publisher :
- Springer Science and Business Media LLC, 2018.
-
Abstract
- The FM index (Ferragina and Manzini in J ACM 52(4):552-581, 2005) is a widely-used compressed data structure that stores a string T in a compressed form and also supports fast pattern matching queries. In this paper, we describe new FM-index variants that combine nice theoretical properties, simple implementation and improved practical performance. Our main theoretical result is a new technique called fixed block compression boosting, which is a simpler and faster alternative to optimal compression boosting and implicit compression boosting used in previous FM-indexes. We also describe several new techniques for implementing fixed-block boosting efficiently, including a new, fast, and space-efficient implementation of wavelet trees. Our extensive experiments show the new indexes to be consistently fast and small relative to the state-of-the-art, and thus they make a good off-the-shelf choice for many applications.
- Subjects :
- Boosting (machine learning)
General Computer Science
Computer science
Text indexing
Data_CODINGANDINFORMATIONTHEORY
0102 computer and information sciences
02 engineering and technology
01 natural sciences
law.invention
Compressed data structure
Wavelet
law
Suffix array
Wavelet tree
0202 electrical engineering, electronic engineering, information engineering
Wavelet Tree
Pattern matching
FM-index
Compression boosting
Applied Mathematics
020207 software engineering
113 Computer and information sciences
Computer Science Applications
010201 computation theory & mathematics
Theory of computation
Algorithm
Subjects
Details
- ISSN :
- 14320541 and 01784617
- Volume :
- 81
- Database :
- OpenAIRE
- Journal :
- Algorithmica
- Accession number :
- edsair.doi.dedup.....77fddf2a504f06382f24d986dd416f7d