TiCA

Authors :: Renqing Nuobu
Chen Shuo
Nima Zhaxi
Suonan Jiancuo
Source :: AIAM
Publication Year :: 2020
Publisher :: ACM, 2020.
Abstract: This paper proposes a Tibetan text compression algorithm (TiCA), which is based on the fact that each Tibetan syllable is composed of one to seven components and each component has a unique Unicode encoding. First of all, through statistical analysis of 20G Tibetan text corpus, a fault-tolerant mapping dictionary is established and used as the dictionary of the TiCA. The TiCA then compresses the Tibetan text according to the mapping dictionary by mapping the original code to a single code. Finally, the experimental comparison shows that the Tibetan text compression algorithm proposed in this paper has achieved excellent results both in the compression rate and time consuming.

Subjects :: Text corpus
0209 industrial biotechnology
Computer science
Data compression ratio
Data_CODINGANDINFORMATIONTHEORY
02 engineering and technology
Unicode
020901 industrial engineering & automation
Encoding (memory)
Component (UML)
0202 electrical engineering, electronic engineering, information engineering
Code (cryptography)
020201 artificial intelligence & image processing
Syllable
Algorithm
Text compression

Database :: OpenAIRE
Journal :: Proceedings of the 2nd International Conference on Artificial Intelligence and Advanced Manufacture
Accession number :: edsair.doi...........665d19c20e9fb8cc734e86d18f7e38a5
Full Text :: https://doi.org/10.1145/3421766.3421868