Back to Search Start Over

Research on the Uyghur morphological segmentation model with an attention mechanism.

Authors :
Abudouwaili, Gulinigeer
Abiderexiti, Kahaerjing
Shen, Yunfei
Wumaier, Aishan
Source :
Connection Science. Mar2022, Vol. 34 Issue 1, p2577-2596. 20p.
Publication Year :
2022

Abstract

Morphological segmentation is a basic task in agglutinative language information processing, dividing words into the smallest semantic unit morphemes. There are two types of morphological segmentation: canonical segmentation and surface segmentation. As a typical agglutinative language, Uyghur usually uses statistical-based methods in canonical segmentation, which relies on the artificial extraction of features. In surface segmentation, the artificial feature extraction process is avoided by using the neural network. However, to date, no model can provide both segmentation results in Uyghur without adding features. In addition, morphological segmentation is usually regarded as a sequence annotation task, so label imbalance easily occurs in datasets. Given the above situation, this paper proposes an improved labelling scheme that joins morphological boundary labels and voice harmony labels for the two kinds of segmentation simultaneously. Then, a convolution network and attention mechanism are added to capture local and global features, respectively. Finally, morphological segmentation is regarded as a sequence labeling task of character sequences. Due to the problem of label proportion imbalance and noise in the dataset, a focal loss function with label smoothing is used. The experimental results show that the F1 values of canonical segmentation and surface segmentation achieve the best results. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09540091
Volume :
34
Issue :
1
Database :
Academic Search Index
Journal :
Connection Science
Publication Type :
Academic Journal
Accession number :
161161414
Full Text :
https://doi.org/10.1080/09540091.2022.2134843