1. Enhancing isomorphism between word embedding spaces for distant languages bilingual lexicon induction.
- Author
-
Ding, Qiuyu, Cao, Hailong, Feng, Zihao, and Zhao, Tiejun
- Subjects
ISOMORPHISM (Mathematics) ,LEXICON ,SYNONYMS ,TRANSLATING & interpreting ,VOCABULARY - Abstract
Most of the bilingual lexicon induction (BLI) models learn a mapping function that can transfer word embedding (WE) spaces from one language to another. This usually relies on the isomorphism hypothesis, which posits that words in different languages share the same structures and relationships (i.e. similar in geometric structure). However, WE's isomorphism weakens substantially in distant language pairs, resulting in low accuracy of BLI. To address this problem, we propose a novel BLI method incorporating synonymous knowledge. The main idea is to stabilize the distance between words to optimize the monolingual WE space, yielding higher isomorphism. Specifically, we first induce monolingual synonym pairs from Wordnet and construct monolingual synonym lexicons. We then generate pseudo-sentences by substituting words in the training corpus with synonyms. Finally, the original sentences and pseudo-sentences are jointly used to generate monolingual WEs, enabling the word vectors of synonyms to be closer naturally. Comprehensive experiments on standard BLI datasets in diverse distant languages demonstrate that our method significantly outperforms the strong BLI systems in word translation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF