Back to Search
Start Over
Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives
- Source :
- IEEE Access, Vol 8, Pp 164717-164726 (2020)
- Publication Year :
- 2020
- Publisher :
- IEEE, 2020.
-
Abstract
- Background: In the Big Data era there is an increasing need to fully exploit and analyse the huge quantity of information available about health. Natural Language Processing (NLP) technologies can contribute to extract relevant information from unstructured data contained in Electronic Health Records (EHR) such as clinical notes, patient’s discharge summaries and radiology reports among others. Extracted information could help in health-related decision making processes. Named entity recognition (NER) devoted to detect important concepts in texts (diseases, symptoms, drugs, etc.) is a crucial task in information extraction processes especially in languages other than English. In this work, we develop a deep learning-based NLP pipeline for biomedical entity extraction in Spanish clinical narrative. Methods: We explore the use of contextualized word embeddings to enhance named entity recognition in Spanish language clinical text, particularly of pharmacological substances, compounds, and proteins. Various combinations of word and sense embeddings were tested on the evaluation corpus of the PharmacoNER 2019 task, the Spanish Clinical Case Corpus (SPACCC). This data set consists of clinical case sections derived from open access Spanish-language medical publications. Results: NER system integrates in-domain pre-trained Flair and FastText word embeddings, byte-pairwise encoded and the bi-LSTM-based character word embeddings. The system yielded the best performance measure with F-score of 90.84%. Error analysis showed that the main source of errors for the best model is the newly detected false positive entities with the half of that amount of errors belonged to longer than the actual ones detected entities. Conclusions: Our study shows that our deep-learning-based system with domain-specific contextualized embeddings coupled with stacking of complementary embeddings yields superior performance over the system with integrated standard and general-domain word embeddings. With this system, we achieve performance competitive with the state-of-the-art.
- Subjects :
- Spanish language
General Computer Science
Computer science
named entity recognition
Context (language use)
010501 environmental sciences
computer.software_genre
01 natural sciences
03 medical and health sciences
Named-entity recognition
Information system
General Materials Science
natural language processing
contextualized word embeddings
030304 developmental biology
0105 earth and related environmental sciences
0303 health sciences
Clinical case narratives
business.industry
General Engineering
deep learning
Unstructured data
Information extraction
Task analysis
Artificial intelligence
lcsh:Electrical engineering. Electronics. Nuclear engineering
business
language representations
computer
lcsh:TK1-9971
Word (computer architecture)
Natural language processing
Subjects
Details
- Language :
- English
- ISSN :
- 21693536
- Volume :
- 8
- Database :
- OpenAIRE
- Journal :
- IEEE Access
- Accession number :
- edsair.doi.dedup.....ce5ef93df9f8b5655c9dd019ce00257c