Back to Search
Start Over
Parallel sequence tagging for concept recognition.
- Source :
-
BMC bioinformatics [BMC Bioinformatics] 2022 Mar 24; Vol. 22 (Suppl 1), pp. 623. Date of Electronic Publication: 2022 Mar 24. - Publication Year :
- 2022
-
Abstract
- Background: Named Entity Recognition (NER) and Normalisation (NEN) are core components of any text-mining system for biomedical texts. In a traditional concept-recognition pipeline, these tasks are combined in a serial way, which is inherently prone to error propagation from NER to NEN. We propose a parallel architecture, where both NER and NEN are modeled as a sequence-labeling task, operating directly on the source text. We examine different harmonisation strategies for merging the predictions of the two classifiers into a single output sequence.<br />Results: We test our approach on the recent Version 4 of the CRAFT corpus. In all 20 annotation sets of the concept-annotation task, our system outperforms the pipeline system reported as a baseline in the CRAFT shared task, a competition of the BioNLP Open Shared Tasks 2019. We further refine the systems from the shared task by optimising the harmonisation strategy separately for each annotation set.<br />Conclusions: Our analysis shows that the strengths of the two classifiers can be combined in a fruitful way. However, prediction harmonisation requires individual calibration on a development set for each annotation set. This allows achieving a good trade-off between established knowledge (training set) and novel information (unseen concepts).<br /> (© 2022. The Author(s).)
- Subjects :
- Data Mining
Subjects
Details
- Language :
- English
- ISSN :
- 1471-2105
- Volume :
- 22
- Issue :
- Suppl 1
- Database :
- MEDLINE
- Journal :
- BMC bioinformatics
- Publication Type :
- Academic Journal
- Accession number :
- 35331131
- Full Text :
- https://doi.org/10.1186/s12859-021-04511-y