Back to Search Start Over

Word Sense Disambiguation Corpus Development for Romanian Language.

Authors :
Scutelnicu, Liviu Andrei
Source :
Procedia Computer Science; 2023, Vol. 225, p822-831, 10p
Publication Year :
2023

Abstract

Research in the area of ​​the interconnection of lexical resources represents a real challenge, because it addresses the difficult problem of semantic understanding and, more precisely, the disambiguation of the meaning of the words - Word Sense Disambiguation (WSD). In the current and prospective context of the information society, the existence of the digital format of the fundamental works of a national culture is strictly necessary. It is a topical issue throughout the world of creating a representative corpus of a language accessible through the Internet, the corpus being a concrete, clear picture of the use of that language. In this study we will describe the development of a Romanian language GOLD corpus, related to the multiple meanings existing for various words. We propose a corpus annotation standard, based on three lexical resources as follows: the Thesaurus Dictionary of the Romanian Language in electronic format (eDTLR), from which we extracted a list of words with multiple meanings; from the Reference Corpus for Contemporary Romanian Language (CoRoLa) we extracted contexts in which these words were founded and from the the Romanian WordNet (RoWN) resource, we took into account the sense meaning of the word from the corpus context. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
18770509
Volume :
225
Database :
Supplemental Index
Journal :
Procedia Computer Science
Publication Type :
Academic Journal
Accession number :
174059121
Full Text :
https://doi.org/10.1016/j.procs.2023.10.069