Back to Search Start Over

Natural-Language Text Compression Using Reverse Multi-Delimiter Codes.

Authors :
Anisimov, A. V.
Zavadskyi, I. O.
Chudakov, T. S.
Source :
Cybernetics & Systems Analysis. Jan2024, Vol. 60 Issue 1, p1-12. 12p.
Publication Year :
2024

Abstract

This paper studies binary reverse multi-delimiter (RMD) data compression codes. RMD codes have a range of useful properties, such as unique decodability, completeness, universality, synchronizability, recognition using a finite automaton, and the ability for rapid data retrieval within an encoded file. The authors have constructed a simple monotonic mapping from the set of non-negative integers to the codeword set. Based on this mapping, they have developed a fast byte-aligned decoding algorithm. Computer experiments demonstrate that we can decode RMD codes nearly as quickly as the SCDC code and several times faster than the Fibonacci code. Compared to known codes of a similar type, RMD codes exhibit a better compression ratio for natural language texts (more than four times closer to the entropy bound than SCDC). Additionally, the paper describes a technology for preprocessing natural language texts, which, combined with encoding using RMD codes, enhances the efficiency of powerful modern archivers. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10600396
Volume :
60
Issue :
1
Database :
Academic Search Index
Journal :
Cybernetics & Systems Analysis
Publication Type :
Academic Journal
Accession number :
175458897
Full Text :
https://doi.org/10.1007/s10559-024-00641-2