Back to Search
Start Over
Measuring the effect of different types of unsupervised word representations on Medical Named Entity Recognition
- Source :
- International Journal of Medical Informatics. 129:100-106
- Publication Year :
- 2019
- Publisher :
- Elsevier BV, 2019.
-
Abstract
- Background This work deals with Natural Language Processing applied to the clinical domain. Specifically, the work deals with a Medical Entity Recognition (MER) on Electronic Health Records (EHRs). Developing a MER system entailed heavy data preprocessing and feature engineering until Deep Neural Networks (DNNs) emerged. However, the quality of the word representations in terms of embedded layers is still an important issue for the inference of the DNNs. Goal The main goal of this work is to develop a robust MER system adapting general-purpose DNNs to cope with the high lexical variability shown in EHRs. In addition, given that EHRs tend to be scarce when there are out-domain corpora available, the aim is to assess the impact of the word representations on the performance of the MER as we move to other domains. In this line, exhaustive experimentation varying information generation methods and network parameters are crucial. Methods We adapted a general purpose sequential tagger based on Bidirectional Long-Short Term Memory cells and Conditional Random Fields (CRFs) in order to make it tolerant to high lexical variability and a limited amount of corpora. To this end, we incorporated part of speech (POS) and semantic-tag embedding layers to the word representations. Results One of the strengths of this work is the exhaustive evaluation of dense word representations obtained varying not only the domain and genre but also the learning algorithms and their parameter settings. With the proposed method, we attained an error reduction of 1.71 (5.7%) compared to the state-of-the-art even that no preprocessing or feature engineering was used. Conclusions Our results indicate that dense representations built taking word order into account leverage the entity extraction system. Besides, we found that using a medical corpus (not necessarily EHRs) to infer the representations improves the performance, even if it does not correspond to the same genre.
- Subjects :
- Feature engineering
Conditional random field
020205 medical informatics
Computer science
Health Informatics
02 engineering and technology
computer.software_genre
03 medical and health sciences
0302 clinical medicine
Named-entity recognition
0202 electrical engineering, electronic engineering, information engineering
Electronic Health Records
Preprocessor
030212 general & internal medicine
CRFS
Natural Language Processing
Subject Headings
Artificial neural network
business.industry
Semantics
Neural Networks, Computer
Data pre-processing
Artificial intelligence
business
computer
Algorithms
Natural language processing
Word order
Subjects
Details
- ISSN :
- 13865056
- Volume :
- 129
- Database :
- OpenAIRE
- Journal :
- International Journal of Medical Informatics
- Accession number :
- edsair.doi.dedup.....dabe1ef05654993c48fcf4ae4edbbdb4
- Full Text :
- https://doi.org/10.1016/j.ijmedinf.2019.05.022