Back to Search Start Over

GNER: A Generative Model for Geological Named Entity Recognition Without Labeled Data Using Deep Learning.

Authors :
Qiu, Qinjun
Xie, Zhong
Wu, Liang
Tao, Liufeng
Source :
Earth & Space Science. Jun2019, Vol. 6 Issue 6, p931-946. 16p.
Publication Year :
2019

Abstract

A variety of detailed data about geological topics and geoscience knowledge are buried in the geoscience literature and rarely used. Named entity recognition (NER) provides both opportunities and challenges to leverage this wealth of data in the geoscience literature for data analysis and further information extraction. Existing NER models and techniques are mainly based on rule‐based and supervised approaches, and developing such systems requires a costly manual effort. In this paper, we first design a generic stepwise framework for domain‐specific NER. Following this framework, domain‐specific entities and domain‐general words are collected and selected as seed terms. Normalization and grouping processes are then applied to these seed terms for further analysis. A random extraction algorithm based on a unigram language model is used to generate a large‐scale training data set consisting of probabilistically labeled pseudosentences. Each generated sentence is then used as input to the self‐training and learning algorithm. Experimental results on two constructed data sets demonstrate that the proposed model effectively recognizes and identifies geological named entities. Plain Language Summary: Existing entity recognition and classification methods are less functional for automatic geological name entity recognition. In this paper, we propose a stepwise unsupervised approach to geological name entity recognition in geological reports that requires no labeled data. This approach dynamically adapts to extract and identify unseen instances. We hope that our approach will serve as an alternative method that deserves further study. Key Points: Geological named entities are extracted from unstructured Chinese geoscience reports using deep learningThe proposed framework can be easily extended to other subject domains through fine‐tuningA training data set is constructed based on both domain‐specific entities and domain‐general words [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
23335084
Volume :
6
Issue :
6
Database :
Academic Search Index
Journal :
Earth & Space Science
Publication Type :
Academic Journal
Accession number :
137772250
Full Text :
https://doi.org/10.1029/2019EA000610