Back to Search Start Over

Infectious risk events and their novelty in event-based surveillance: new definitions and annotated corpus.

Authors :
Delon, François
Bédubourg, Gabriel
Bouscarrat, Léo
Meynard, Jean-Baptiste
Valois, Aude
Queyriaux, Benjamin
Ramisch, Carlos
Tanti, Marc
Source :
Language Resources & Evaluation. Mar2024, p1-19.
Publication Year :
2024

Abstract

Event-based surveillance (EBS) requires the analysis of an ever-increasing volume of documents, requiring automated processing to support human analysts. Few annotated corpora are available for the evaluation of information extraction tools in the EBS domain. The main objective of this work was to build a corpus containing documents which are representative of those collected in the current EBS information systems, and to annotate them with events and their novelty. We proposed new definitions of infectious events and their novelty suited to the background work of analysts working in the EBS domain, and we compiled a corpus of 305 documents describing 283 infectious events. There were 36 included documents in French, representing a total of 11 events, with the remainder in English. We annotated novelty for the 110 most recent documents in the corpus, resulting in 101 events. The inter-annotator agreement was 0.74 for event identification (F1-Score) and 0.69 [95% CI: 0.51; 0.88] (Kappa) for novelty annotation. The overall agreement for entity annotation was lower, with a significant variation according to the type of entities considered (range 0.30–0.68). This corpus is a useful tool for creating and evaluating algorithms and methods submitted by EBS research teams for event detection and annotation of their novelties, aiming at the operational improvement of document flow processing. The small size of this corpus makes it less suitable for training natural language processing models, although this limitation tends to fade away when using few-shots learning methods. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
1574020X
Database :
Academic Search Index
Journal :
Language Resources & Evaluation
Publication Type :
Academic Journal
Accession number :
175843667
Full Text :
https://doi.org/10.1007/s10579-024-09728-w