Back to Search Start Over

A Fistful of Vectors: A Tool for Intrinsic Evaluation of Word Embeddings.

Authors :
Ascari, Roberto
Giabelli, Anna
Malandri, Lorenzo
Mercorio, Fabio
Mezzanzanica, Mario
Source :
Cognitive Computation; May2024, Vol. 16 Issue 3, p949-963, 15p
Publication Year :
2024

Abstract

The utilization of word embeddings—powerful models computed through Neural Network architectures that encode words as vectors—has witnessed rapid growth across various Natural Language Processing applications, encompassing semantic analysis, information retrieval, dependency parsing, question answering, and machine translation. The efficacy of these tasks is strictly linked to the quality of the embeddings, underscoring the critical importance of evaluating and selecting optimal embedding models. While established procedures and benchmarks exist for intrinsic evaluation, the authors note a conspicuous absence of comprehensive evaluations of intrinsic embedding quality across multiple tasks. This paper introduces vec2best, a unified tool encompassing state-of-the-art intrinsic evaluation tasks across diverse benchmarks. vec2best furnishes the user with an extensive evaluation of word embedding models. It represents a framework for evaluating word embeddings trained using various methods and hyper-parameters on a range of tasks from the literature. The tool yields a holistic evaluation metric for each model called the PCE (Principal Component Evaluation). We conducted evaluations on 135 word embedding models, trained using GloVe, fastText, and word2vec, across four tasks integrated into vec2best (similarity, analogy, categorization, and outlier detection), along with their respective benchmarks. Additionally, we leveraged vec2best to optimize embedding hyper-parameter configurations in a real-world scenario. vec2best is conveniently accessible as a pip-installable Python package. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
18669956
Volume :
16
Issue :
3
Database :
Complementary Index
Journal :
Cognitive Computation
Publication Type :
Academic Journal
Accession number :
177596565
Full Text :
https://doi.org/10.1007/s12559-023-10235-3