1. Combining Bilingual Lexicons Extracted from Comparable Corpora: the Complementary Approach between Word Embedding and Text Mining
- Author
-
Sourour Belhaj Rhouma, Chiraz Latiri, Catherine Berrut, Laboratoire d'Informatique de Grenoble (LIG ), Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019]), Modélisation et Recherche d’Information Multimédia [Grenoble] (MRIM ), and Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])
- Subjects
Word embedding ,business.industry ,Computer science ,Bilingual dictionary ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Context (language use) ,0102 computer and information sciences ,02 engineering and technology ,computer.software_genre ,ComputingMethodologies_ARTIFICIALINTELLIGENCE ,01 natural sciences ,Focus (linguistics) ,ComputingMethodologies_PATTERNRECOGNITION ,Text mining ,010201 computation theory & mathematics ,[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR] ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing ,ComputingMilieux_MISCELLANEOUS - Abstract
Recently, different works on bilingual lexicon extraction from comparable corpora have been proposed. This paper presents how to combine differents methods for bilingual lexicon extraction based on standard context vectors and advanced text mining methods. In this respect, we focus on combining bilingual lexicons based on context vectors, association rules and contextual meta-rules. The combination of lexicons leads to a less sparse representation in order to extract the most effective translations from these lexicons and create an optimal bilingual lexicon. An experimental validation conducted on two pairs of languages of the CLEF 2003 campaign evaluation, shows that the combination of the models give a significant improvement compared to the standard approach.
- Published
- 2018