1. Neural machine translation with a polysynthetic low resource language
- Author
-
John Ortega, Richard Alexander Castro Mamani, and Kyunghyun Cho
- Subjects
Linguistics and Language ,Machine translation ,Low resource ,Computer science ,business.industry ,02 engineering and technology ,computer.software_genre ,Language and Linguistics ,Recurrent neural network ,Artificial Intelligence ,Morpheme ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,Suffix ,Computational linguistics ,business ,Baseline (configuration management) ,computer ,Morphological segmentation ,Software ,Natural language processing - Abstract
Low-resource languages (LRL) with complex morphology are known to be more difficult to translate in an automatic way. Some LRLs are particularly more difficult to translate than others due to the lack of research interest or collaboration. In this article, we experiment with a specific LRL, Quechua, that is spoken by millions of people in South America yet has not undertaken a neural approach for translation until now. We improve the latest published results with baseline BLEU scores using the state-of-the-art recurrent neural network approaches for translation. Additionally, we experiment with several morphological segmentation techniques and introduce a new one in order to decompose the language’s suffix-based morphemes. We extend our work to other high-resource languages (HRL) like Finnish and Spanish to show that Quechua, for qualitative purposes, can be considered compatible with and translatable into other major European languages with measurements comparable to the state-of-the-art HRLs at this time. We finalize our work by making our best two Quechua–Spanish translation engines available on-line.
- Published
- 2020
- Full Text
- View/download PDF