Evaluating language models for the retrieval and categorization of lexical collocations

Authors :: Joan Codina-Filbà
Leo Wanner
Luis Espinosa Anke
Source :: EACL
Publication Year :: 2021
Publisher :: ACL (Association for Computational Linguistics), 2021.
Abstract: Comunicació presentada a: EACL 2021 celebrat del 19 a 23 d'abril de 2021 en línia. Lexical collocations are idiosyncratic combinations of two syntactically bound lexical items (e.g., “heavy rain”, “take a step” or “undergo surgery”). Understanding their degree of compositionality and idiosyncrasy, as well their underlying semantics, is crucial for language learners, lexicographers and downstream NLP applications alike. In this paper we analyse a suite of language models for collocation understanding. We first construct a dataset of apparitions of lexical collocations in context, categorized into 16 representative semantic categories. Then, we perform two experiments: (1) unsupervised collocate retrieval, and (2) supervised collocation classification in context. We find that most models perform well in distinguishing light verb constructions, especially if the collocation’s first argument acts as a subject, but often fail to distinguish, first, different syntactic structures within the same semantic category, and second, finer-grained categories which restrict the set of correct collocates. This work was partially supported by the European Commission via its H2020 Program under the contract number 870930.

Subjects :: Lexicologia
Collocation
Light verb
Computer science
Principle of compositionality
business.industry
Context (language use)
computer.software_genre
Semantics
Lexical item
Semàntica
Categorization
Artificial intelligence
Language model
business
computer
Natural language processing

Tools