Back to Search
Start Over
Word Sense Disambiguation Corpus Development for Romanian Language.
- Source :
- Procedia Computer Science; 2023, Vol. 225, p822-831, 10p
- Publication Year :
- 2023
-
Abstract
- Research in the area of the interconnection of lexical resources represents a real challenge, because it addresses the difficult problem of semantic understanding and, more precisely, the disambiguation of the meaning of the words - Word Sense Disambiguation (WSD). In the current and prospective context of the information society, the existence of the digital format of the fundamental works of a national culture is strictly necessary. It is a topical issue throughout the world of creating a representative corpus of a language accessible through the Internet, the corpus being a concrete, clear picture of the use of that language. In this study we will describe the development of a Romanian language GOLD corpus, related to the multiple meanings existing for various words. We propose a corpus annotation standard, based on three lexical resources as follows: the Thesaurus Dictionary of the Romanian Language in electronic format (eDTLR), from which we extracted a list of words with multiple meanings; from the Reference Corpus for Contemporary Romanian Language (CoRoLa) we extracted contexts in which these words were founded and from the the Romanian WordNet (RoWN) resource, we took into account the sense meaning of the word from the corpus context. [ABSTRACT FROM AUTHOR]
- Subjects :
- ROMANIAN language
SEMANTICS
CORPORA
POLYSEMY
ENCYCLOPEDIAS & dictionaries
Subjects
Details
- Language :
- English
- ISSN :
- 18770509
- Volume :
- 225
- Database :
- Supplemental Index
- Journal :
- Procedia Computer Science
- Publication Type :
- Academic Journal
- Accession number :
- 174059121
- Full Text :
- https://doi.org/10.1016/j.procs.2023.10.069