1. Block trees
- Author
-
Djamal Belazzougui, Manuel Cáceres, Travis Gagie, Paweł Gawrychowski, Juha Kärkkäinen, Gonzalo Navarro, Alberto Ordóñez, Simon J. Puglisi, Yasuo Tabei, Department of Computer Science, Algorithmic Bioinformatics, Practical Algorithms and Data Structures on Strings research group / Juha Kärkkäinen, Helsinki Institute for Information Technology, and Bioinformatics
- Subjects
Computer Networks and Communications ,Applied Mathematics ,Compressed data structures ,0102 computer and information sciences ,02 engineering and technology ,Repetitive string collections ,Lempel-Ziv compression ,113 Computer and information sciences ,01 natural sciences ,STRINGS ,Theoretical Computer Science ,DATA-COMPRESSION ,Computational Theory and Mathematics ,010201 computation theory & mathematics ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,APPROXIMATION - Abstract
Let string S[1..n] be parsed into z phrases by the Lempel-Ziv algorithm. The corresponding compression algorithm encodes S in O(z) space, but it does not support random access to S. We introduce a data structure, the block tree, that represents S in O(z log(n/z)) space and extracts any symbol of S in time O(log(n/z)), among other space-time tradeoffs. The structure also supports other queries that are useful for building compressed data structures on top of S. Further, block trees can be built in linear time and in a scalable manner. Our experiments show that block trees offer relevant space-time tradeoffs compared to other compressed string representations for highly repetitive strings. (C) 2020 Elsevier Inc. All rights reserved.
- Published
- 2021