Back to Search Start Over

The first named entity recognizer in Maithili: Resource creation and system development.

Authors :
Priyadarshi, Ankur
Saha, Sujan Kumar
Source :
Journal of Intelligent & Fuzzy Systems. 2021, Vol. 41 Issue 1, p1083-1095. 13p.
Publication Year :
2021

Abstract

In this paper, we present our effort on the development of a Maithili Named Entity Recognition (NER) system. Maithili is one of the official languages of India, with around 50 million native speakers. Although various NER systems have been developed in several Indian languages, we did not find any openly available NER resource or system in Maithili. For the development, we manually annotated a Maithili NER corpus containing around 200K words. We prepared a baseline classifier using Conditional Random Fields (CRF). Then we ran many experiments using various recurrent neural networks (RNN). We collected larger raw corpus to obtain better word embedding and character embedding. In our experiments, we found, neural models are better than CRF; a CRF layer is effective for the prediction of the final output in the RNN models; character embedding is effective in Maithili language. We also investigated the effectiveness of gazetteer lists in neural models. We prepared a few gazetteer lists from various web resources and used those in the neural models. The incorporation of the gazetteer layer caused performance improvement. The final system achieved an f-measure of 91.6% with 94.9% precision and 88.53% recall. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10641246
Volume :
41
Issue :
1
Database :
Academic Search Index
Journal :
Journal of Intelligent & Fuzzy Systems
Publication Type :
Academic Journal
Accession number :
152233463
Full Text :
https://doi.org/10.3233/JIFS-210051