Back to Search Start Over

ПАРАЛЕЛНИ КОРПУСИ У СРБИЈИ1 – МОГУЋНОСТИ ЗА ПАРАЛЕЛНО ПРОНАЛАЖЕЊЕ ИНФОРМАЦИЈА НА ДВА ИЛИ ВИШЕ ЈЕЗИКА.

Authors :
Андоновски, Јелена С.
Source :
Bibliotekar (0006-1816). 2021, Vol. 63 Issue 1, p51-74. 24p.
Publication Year :
2021

Abstract

Aligned multilingual corpora have become essential resources in multilingual Natural Language Processing (NLP) in the last decades, as well as one of the major resources for researchers in various areas of linguistics and related language disciplines. Parallel corpora are language corpora that contain a collection of one or more original texts in one language and their translations into one or more other languages. Original texts and their translations are aligned at some level of text divisions (e.g. sentence, paragraph, and chapter level). In most cases, parallel corpora contain texts in only two languages but also there are examples of one-language parallel corpora containing a collection of different editions of the same text in one language. In Serbia, JeRTeh, Language Resources and Technologies Society (former Group for Language Technologies) has been developing parallel corpora containing Serbian texts for decades. Until today, JeRTeh has developed: Serbian-French aligned corpus (SrpFranKor) and Serbian-English aligned corpus (SrpEngKor), digital library Biblisha with several parallel collections, and multilingual edition Multilingual Vern. In addition, corpora texts in the Serbian language are part of multilingual parallel corpora Plato’s Republic and Orwell’s 1984 developed during the international projects, as well as part of some corpora developing now in the region and the world. This paper presents corpora developed by Group for Language Technologies, their structure and purpose, as well as possibilities for information retrieval in them. [ABSTRACT FROM AUTHOR]

Details

Language :
Serbian
ISSN :
00061816
Volume :
63
Issue :
1
Database :
Academic Search Index
Journal :
Bibliotekar (0006-1816)
Publication Type :
Academic Journal
Accession number :
151529051
Full Text :
https://doi.org/10.18485/bibliotekar.2021.63.1.3