Back to Search Start Over

Measuring the effect of different types of unsupervised word representations on Medical Named Entity Recognition

Authors :
Iakes Goenaga
Arantza Casillas
Xabier Soto
Nerea Ezeiza
Alicia Pérez
Source :
International Journal of Medical Informatics. 129:100-106
Publication Year :
2019
Publisher :
Elsevier BV, 2019.

Abstract

Background This work deals with Natural Language Processing applied to the clinical domain. Specifically, the work deals with a Medical Entity Recognition (MER) on Electronic Health Records (EHRs). Developing a MER system entailed heavy data preprocessing and feature engineering until Deep Neural Networks (DNNs) emerged. However, the quality of the word representations in terms of embedded layers is still an important issue for the inference of the DNNs. Goal The main goal of this work is to develop a robust MER system adapting general-purpose DNNs to cope with the high lexical variability shown in EHRs. In addition, given that EHRs tend to be scarce when there are out-domain corpora available, the aim is to assess the impact of the word representations on the performance of the MER as we move to other domains. In this line, exhaustive experimentation varying information generation methods and network parameters are crucial. Methods We adapted a general purpose sequential tagger based on Bidirectional Long-Short Term Memory cells and Conditional Random Fields (CRFs) in order to make it tolerant to high lexical variability and a limited amount of corpora. To this end, we incorporated part of speech (POS) and semantic-tag embedding layers to the word representations. Results One of the strengths of this work is the exhaustive evaluation of dense word representations obtained varying not only the domain and genre but also the learning algorithms and their parameter settings. With the proposed method, we attained an error reduction of 1.71 (5.7%) compared to the state-of-the-art even that no preprocessing or feature engineering was used. Conclusions Our results indicate that dense representations built taking word order into account leverage the entity extraction system. Besides, we found that using a medical corpus (not necessarily EHRs) to infer the representations improves the performance, even if it does not correspond to the same genre.

Details

ISSN :
13865056
Volume :
129
Database :
OpenAIRE
Journal :
International Journal of Medical Informatics
Accession number :
edsair.doi.dedup.....dabe1ef05654993c48fcf4ae4edbbdb4
Full Text :
https://doi.org/10.1016/j.ijmedinf.2019.05.022