Back to Search
Start Over
Lien entre Transformée de Burrows-Wheeler (BWT) et BWT étendue (XBW) via l'automate d'Aho-Corasick : applications en compression par encodage des longueurs
- Source :
- Leibniz International Proceedings in Informatics (LIPIcs), 30th Annual Symposium on Combinatorial Pattern Matching (CPM), 30th Annual Symposium on Combinatorial Pattern Matching (CPM), University of Pisa, Jun 2019, Pise, Italy. pp.24:1--24:20, ⟨10.4230/LIPIcs.CPM.2019.24⟩, CPM 2019-30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019-30th Annual Symposium on Combinatorial Pattern Matching, University of Pisa, Jun 2019, Pise, Italy. pp.24:1--24:20, ⟨10.4230/LIPIcs.CPM.2019.24⟩
- Publication Year :
- 2019
- Publisher :
- Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.
-
Abstract
- URN: urn:nbn:de:0030-drops-104955URL: http://drops.dagstuhl.de/opus/volltexte/2019/10495/ISBN ={978-3-95977-103-0},ISSN ={1868-8969},; International audience; The boom of genomic sequencing makes compression of sets of sequences inescapable. This underlies the need for multi-string indexing data structures that helps compressing the data. The most prominent example of such data structures is the Burrows-Wheeler Transform (BWT), a reversible permutation of a text that improves its compressibility. A similar data structure, the eXtended Burrows-Wheeler Transform (XBW), is able to index a tree labelled with alphabet symbols. A link between a multi-string BWT and the Aho-Corasick automaton has already been found and led to a way to build a XBW from a multi-string BWT. We exhibit a stronger link between a multi-string BWT and a XBW by using the order of the concatenation in the multi-string. This bijective link has several applications: first, it allows one to build one data structure from the other; second, it enables one to compute an ordering of the input strings that optimises a Run-Length measure (i.e., the compressibility) of the BWT or of the XBW.
- Subjects :
- Run Length Encoding (RLE)
000 Computer science, knowledge, general works
Aho-Corasick automaton
Aho-Corasick Tree
[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]
education
Burrows Wheeler Transform
Compression
Data_CODINGANDINFORMATIONTHEORY
0102 computer and information sciences
02 engineering and technology
XBWT
[INFO.INFO-DM]Computer Science [cs]/Discrete Mathematics [cs.DM]
Data Structures
113 Computer and information sciences
01 natural sciences
Algorithm
2012 ACM Subject Classification Mathematics of computing → Discrete mathematics
Theory of computation → Randomness, geometry and discrete structures
Theory of computation → Data structures and algorithms for data management
010201 computation theory & mathematics
Computer Science
0202 electrical engineering, electronic engineering, information engineering
Indexing
020201 artificial intelligence & image processing
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
Stringology
Subjects
Details
- Language :
- English
- ISSN :
- 18688969
- Database :
- OpenAIRE
- Journal :
- Leibniz International Proceedings in Informatics (LIPIcs), 30th Annual Symposium on Combinatorial Pattern Matching (CPM), 30th Annual Symposium on Combinatorial Pattern Matching (CPM), University of Pisa, Jun 2019, Pise, Italy. pp.24:1--24:20, ⟨10.4230/LIPIcs.CPM.2019.24⟩, CPM 2019-30th Annual Symposium on Combinatorial Pattern Matching, CPM 2019-30th Annual Symposium on Combinatorial Pattern Matching, University of Pisa, Jun 2019, Pise, Italy. pp.24:1--24:20, ⟨10.4230/LIPIcs.CPM.2019.24⟩
- Accession number :
- edsair.doi.dedup.....c335a150fb18d8dbd27fd1d7f9198c57