Back to Search Start Over

Probabilistic vs deep learning based approaches for narrow domain NER in Spanish.

Authors :
Ramos-Flores, Orlando
Pinto, David
Montes-y-Gómez, Manuel
Vázquez, Andrés
Singh, Vivek
Perez, Fernando
Source :
Journal of Intelligent & Fuzzy Systems. 2020, Vol. 39 Issue 2, p2015-2025. 11p.
Publication Year :
2020

Abstract

This work presents an experimental study on the task of Named Entity Recognition (NER) for a narrow domain in Spanish language. This study considers two approaches commonly used in this kind of problem, namely, a Conditional Random Fields (CRF) model and Recurrent Neural Network (RNN). For the latter, we employed a bidirectional Long Short-Term Memory with ELMO's pre-trained word embeddings for Spanish. The comparison between the probabilistic model and the deep learning model was carried out in two collections, the Spanish dataset from CoNLL-2002 considering four classes under the IOB tagging schema, and a Mexican Spanish news dataset with seventeen classes under IOBES schema. The paper presents an analysis about the scalability, robustness, and common errors of both models. This analysis indicates in general that the BiLSTM-ELMo model is more suitable than the CRF model when there is "enough" training data, and also that it is more scalable, as its performance was not significantly affected in the incremental experiments (by adding one class at a time). On the other hand, results indicate that the CRF model is more adequate for scenarios having small training datasets and many classes. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10641246
Volume :
39
Issue :
2
Database :
Academic Search Index
Journal :
Journal of Intelligent & Fuzzy Systems
Publication Type :
Academic Journal
Accession number :
145429338
Full Text :
https://doi.org/10.3233/JIFS-179868