Back to Search Start Over

Research on Uyghur Pattern Matching Based on Syllable Features.

Authors :
Abliz, Wayit
Maimaiti, Maihemuti
Wu, Hao
Wushouer, Jiamila
Abiderexiti, Kahaerjiang
Yibulayin, Tuergen
Wumaier, Aishan
Source :
Information (2078-2489). May2020, Vol. 11 Issue 5, p248. 1p.
Publication Year :
2020

Abstract

Pattern matching is widely used in various fields such as information retrieval, natural language processing (NLP), data mining and network security. In Uyghur (a typical agglutinative, low-resource language with complex morphology, spoken by the ethnic Uyghur group in Xinjiang, China), research on pattern matching is also ongoing. Due to the language characteristics, the pattern matching using characters and words as basic units has insufficient performance. There are two problems for pattern matching: (1) vowel weakening and (2) morphological changes caused by suffixes. In view of the above problems, this paper proposes a Boyer–Moore-U (BM-U) algorithm and a retrievable syllable coding format based on the syllable features of the Uyghur language and the improvement of the Boyer–Moore (BM) algorithm. This algorithm uses syllable features to perform pattern matching, which effectively solves the problem of weakening vowels, and it can better match words with stem shape changes. Finally, in the pattern matching experiments based on character-encoded text and syllable-encoded text for vowel-weakened words, the BM-U algorithm precision, recall, F1-measure and accuracy are improved by 4%, 55%, 33%, 25% and 10%, 52%, 38%, 38% compared to the BM algorithm. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20782489
Volume :
11
Issue :
5
Database :
Academic Search Index
Journal :
Information (2078-2489)
Publication Type :
Academic Journal
Accession number :
143674543
Full Text :
https://doi.org/10.3390/info11050248