Back to Search
Start Over
Low-Resource Unsupervised NMT: Diagnosing the Problem and Providing a Linguistically Motivated Solution
- Source :
- Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, 81-90, STARTPAGE=81;ENDPAGE=90;TITLE=Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
- Publication Year :
- 2020
-
Abstract
- Unsupervised Machine Translation hasbeen advancing our ability to translatewithout parallel data, but state-of-the-artmethods assume an abundance of mono-lingual data. This paper investigates thescenario where monolingual data is lim-ited as well, finding that current unsuper-vised methods suffer in performance un-der this stricter setting. We find that theperformance loss originates from the poorquality of the pretrained monolingual em-beddings, and we propose using linguis-tic information in the embedding train-ing scheme. To support this, we look attwo linguistic features that may help im-prove alignment quality: dependency in-formation and sub-word information. Us-ing dependency-based embeddings resultsin a complementary word representationwhich offers a boost in performance ofaround 1.5 BLEU points compared to stan-dardWORD2VECwhen monolingual datais limited to 1 million sentences per lan-guage. We also find that the inclusion ofsub-word information is crucial to improv-ing the quality of the embeddings
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, 81-90, STARTPAGE=81;ENDPAGE=90;TITLE=Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
- Accession number :
- edsair.narcis........fe29a88937a2ab08d283d04cfc727379