Back to Search
Start Over
A new compression algorithm for fast text search
- Source :
- Volume: 24, Issue: 5 4355-4367, Turkish Journal of Electrical Engineering and Computer Science
- Publication Year :
- 2016
- Publisher :
- The Scientific and Technological Research Council of Turkey (TUBITAK-ULAKBIM) - DIGITAL COMMONS JOURNALS, 2016.
-
Abstract
- We propose a new compression algorithm that compresses plain texts by using a dictionary-based model and a compressed string-matching approach that can be used with the compressed texts produced by this algorithm. The compression algorithm (CAFTS) can reduce the size of the texts to approximately 41% of their original sizes. The presented compressed string matching approach (SoCAFTS), which can be used with any of the known pattern matching algorithms, is compared with a powerful compressed string matching algorithm (ETDC) and a compressed string-matching tool (Lzgrep). Although the search speed of ETDC is very good in short patterns, it can only search for exact words and its compression performance differs from one natural language to another because of its word-based structure. Our experimental results show that SoCAFTS is a good solution when it is necessary to search for long patterns in a compressed document.
- Subjects :
- Lossless compression
Incremental encoding
Texture compression
Compressed string matching,text compression,dictionary-based compression,exact pattern matching,CAFTS
General Computer Science
business.industry
Computer science
010102 general mathematics
Commentz-Walter algorithm
Full text search
Pattern recognition
Data_CODINGANDINFORMATIONTHEORY
01 natural sciences
Artificial intelligence
0101 mathematics
Electrical and Electronic Engineering
business
Text compression
Image compression
Data compression
Subjects
Details
- ISSN :
- 13036203 and 13000632
- Volume :
- 24
- Database :
- OpenAIRE
- Journal :
- TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES
- Accession number :
- edsair.doi.dedup.....0733adc7815bbcb6697fa30808979e4b
- Full Text :
- https://doi.org/10.3906/elk-1407-178