Back to Search Start Over

A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers

Authors :
Thi Tuyet Hai Nguyen
Emanuela Boros
Jose G. Moreno
Elvys Linhares Pontes
Ahmed Hamdi
Günter Hackl
Antoine Doucet
Laboratoire Informatique, Image et Interaction - EA 2118 (L3I)
Université de La Rochelle (ULR)
Universität Innsbruck [Innsbruck]
Recherche d’Information et Synthèse d’Information (IRIT-IRIS)
Institut de recherche en informatique de Toulouse (IRIT)
Université Toulouse 1 Capitole (UT1)
Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3)
Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP)
Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1)
Université Fédérale Toulouse Midi-Pyrénées
Université Toulouse III - Paul Sabatier (UT3)
Source :
SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 2021, Virtual Event, Canada. pp.2328-2334, ⟨10.1145/3404835.3463255⟩, SIGIR, Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
Publication Year :
2021
Publisher :
Zenodo, 2021.

Abstract

International audience; Named entity processing over historical texts is more and more being used due to the massive documents and archives being stored in digital libraries. However, due to the poor annotated resources of historical nature, information extraction performances fall behind those on contemporary texts. In this paper, we introduce the development of the NewsEye resource, a multilingual dataset for named entity recognition and linking enriched with stances towards named entities. The dataset is comprised of diachronic historical newspaper material published between 1850 and 1950 in French, German, Finnish, and Swedish. Such historical resource is essential in the context of developing and evaluating named entity processing systems. It evenly allows enhancing the performances of existing approaches on historical documents which enables adequate and efficient semantic indexing of historical documents on digital cultural heritage collections.

Details

Language :
English
ISBN :
978-1-4503-8037-9
ISBNs :
9781450380379
Database :
OpenAIRE
Journal :
SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 2021, Virtual Event, Canada. pp.2328-2334, ⟨10.1145/3404835.3463255⟩, SIGIR, Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
Accession number :
edsair.doi.dedup.....055edb59fcddcd4f53749d1fdc716b23
Full Text :
https://doi.org/10.5281/zenodo.4694465