Start Over

Semantically enhanced term frequency based on word embeddings for Arabic information retrieval

Authors :: Abdelkader El Mahdaouy
Said Ouatik El Alaoui
Eric Gaussier
Analyse de données, Modélisation et Apprentissage automatique [Grenoble] (AMA )
Laboratoire d'Informatique de Grenoble (LIG )
Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])
laboratoire informatique et modélisation (LIM)
Faculté des sciences Dhar El Mahras
Université Grenoble Alpes [2016-2019] (UGA [2016-2019])
Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])
Source :: CIST, Information Science and Technology (CIST), 2016 Fourth IEEE International Colloquium, Information Science and Technology (CIST), 2016 Fourth IEEE International Colloquium, Oct 2016, Tangier, Morocco
Publication Year :: 2016
Publisher :: IEEE, 2016.
Abstract: International audience; Traditional Information Retrieval (IR) models are based on bag-of-words paradigm, where relevance scores are computed based on exact matching of keywords. Although these models have already achieved good performance, it has been shown that most of dissatisfaction cases in relevance are due to term mismatch between queries and documents. In this paper, we introduce novel method to compute term frequency based on semantic similarities using distributed representations of words in a vector space (Word Embeddings). Our main goal is to allow distinct but semantically related terms to match each other and contribute to the relevance scores. Hence, Arabic documents are retrieved beyond the bag-of-words paradigm based on semantic similarities between word vectors. The results on Arabic standard TREC data sets show significant improvement over the baseline bag-of-words models.

Subjects :: Arabic Information Retrieval
Word embedding
Computer science
InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL
Term Mismatch
Context (language use)
Recherche d'Information en langue arabe
Appariement Sémantique
computer.software_genre
Semantics
030507 speech-language pathology & audiology
03 medical and health sciences
Word Embedding
Relevance (information retrieval)
Distributed Representation of word Vectors
Semantic matching
Context model
Information retrieval
Semantically Enhanced Term Frequency
business.industry
Représentations Distribuées des Vecteurs des Mots
05 social sciences
Disparité des Mots
Term (time)
Semantic Matching
IR models
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]
Artificial intelligence
0509 other social sciences
050904 information & library sciences
0305 other medical science
business
computer
Modèles de RI
Natural language processing
Word (computer architecture)

Details

Database :: OpenAIRE
Journal :: 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt)
Accession number :: edsair.doi.dedup.....298ffadbb7d37788238be5c29a9cad34
Full Text :: https://doi.org/10.1109/cist.2016.7805076

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Semantically enhanced term frequency based on word embeddings for Arabic information retrieval

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Semantically enhanced term frequency based on word embeddings for Arabic information retrieval

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources