Back to Search Start Over

Neural machine translation with a polysynthetic low resource language

Authors :
John Ortega
Richard Alexander Castro Mamani
Kyunghyun Cho
Source :
Machine Translation. 34:325-346
Publication Year :
2020
Publisher :
Springer Science and Business Media LLC, 2020.

Abstract

Low-resource languages (LRL) with complex morphology are known to be more difficult to translate in an automatic way. Some LRLs are particularly more difficult to translate than others due to the lack of research interest or collaboration. In this article, we experiment with a specific LRL, Quechua, that is spoken by millions of people in South America yet has not undertaken a neural approach for translation until now. We improve the latest published results with baseline BLEU scores using the state-of-the-art recurrent neural network approaches for translation. Additionally, we experiment with several morphological segmentation techniques and introduce a new one in order to decompose the language’s suffix-based morphemes. We extend our work to other high-resource languages (HRL) like Finnish and Spanish to show that Quechua, for qualitative purposes, can be considered compatible with and translatable into other major European languages with measurements comparable to the state-of-the-art HRLs at this time. We finalize our work by making our best two Quechua–Spanish translation engines available on-line.

Details

ISSN :
15730573 and 09226567
Volume :
34
Database :
OpenAIRE
Journal :
Machine Translation
Accession number :
edsair.doi...........0b6b818909047afd0545b10cf191430e
Full Text :
https://doi.org/10.1007/s10590-020-09255-9