Back to Search
Start Over
A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers
- Source :
- SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 2021, Virtual Event, Canada. pp.2328-2334, ⟨10.1145/3404835.3463255⟩, SIGIR, Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
- Publication Year :
- 2021
- Publisher :
- Zenodo, 2021.
-
Abstract
- International audience; Named entity processing over historical texts is more and more being used due to the massive documents and archives being stored in digital libraries. However, due to the poor annotated resources of historical nature, information extraction performances fall behind those on contemporary texts. In this paper, we introduce the development of the NewsEye resource, a multilingual dataset for named entity recognition and linking enriched with stances towards named entities. The dataset is comprised of diachronic historical newspaper material published between 1850 and 1950 in French, German, Finnish, and Swedish. Such historical resource is essential in the context of developing and evaluating named entity processing systems. It evenly allows enhancing the performances of existing approaches on historical documents which enables adequate and efficient semantic indexing of historical documents on digital cultural heritage collections.
- Subjects :
- Computer science
named entity recognition
Context (language use)
02 engineering and technology
entity linking
computer.software_genre
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
Newspaper
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
World Wide Web
Entity linking
diachronic historical newspapers
Named-entity recognition
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
[INFO.INFO-DL]Computer Science [cs]/Digital Libraries [cs.DL]
[INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]
datasets
05 social sciences
Search engine indexing
Digital library
Cultural heritage
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
Information extraction
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]
0509 other social sciences
multilingual
050904 information & library sciences
stance detection
computer
Subjects
Details
- Language :
- English
- ISBN :
- 978-1-4503-8037-9
- ISBNs :
- 9781450380379
- Database :
- OpenAIRE
- Journal :
- SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 2021, Virtual Event, Canada. pp.2328-2334, ⟨10.1145/3404835.3463255⟩, SIGIR, Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
- Accession number :
- edsair.doi.dedup.....055edb59fcddcd4f53749d1fdc716b23
- Full Text :
- https://doi.org/10.5281/zenodo.4694465