Back to Search Start Over

Data Augmentation Under Scarce Condition for Neural Machine Translation

Authors :
Huang Heyan
Dan Luo
Shi Shumin
Rihai Su
Source :
CCIS
Publication Year :
2019
Publisher :
IEEE, 2019.

Abstract

Neural Machine Translation (NMT) has achieved state-of-the-art performance depending on the availability of copious parallel corpora. However, for low-resource NMT task, the scarcity of training data will inevitably lead to poor translation performance. In order to relieve the dependence on scale of bilingual corpus and to cut down training time, we propose a novel data augmentation method named SMC under scarce condition that can Sample Monolingual Corpus containing difficult words only in back-translation process for Mongolian-Chinese (Mn-Ch) and English-Chinese (En-Ch) NMT. Inspired by work in curriculum learning, our approach takes into account the various difficulty-degree of the sample and the corresponding model capabilities. Experimental results show that our method improves translation quality respectively by up to 2.4 and 1.72 BLEU points over the baselines on En-Ch and Mn-Ch datasets while greatly reducing training time.

Details

Database :
OpenAIRE
Journal :
2019 IEEE 6th International Conference on Cloud Computing and Intelligence Systems (CCIS)
Accession number :
edsair.doi...........b23d2b848a97c5e7488ffdb4e7b41362