Back to Search Start Over

Introducing RONEC -- the Romanian Named Entity Corpus

Authors :
Dumitrescu, Stefan Daniel
Avram, Andrei-Marius
Publication Year :
2019

Abstract

We present RONEC - the Named Entity Corpus for the Romanian language. The corpus contains over 26000 entities in ~5000 annotated sentences, belonging to 16 distinct classes. The sentences have been extracted from a copy-right free newspaper, covering several styles. This corpus represents the first initiative in the Romanian language space specifically targeted for named entity recognition. It is available in BRAT and CoNLL-U Plus formats, and it is free to use and extend at github.com/dumitrescustefan/ronec .<br />Comment: 8 pages + annex, accepted to LREC2020 in the main conference

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.1909.01247
Document Type :
Working Paper