Back to Search Start Over

Chinese Named Entity Recognition in the Geoscience Domain Based on BERT.

Authors :
Lv, Xia
Xie, Zhong
Xu, Dexin
Jin, Xiangguo
Ma, Kai
Tao, Liufeng
Qiu, Qinjun
Pan, Yongsheng
Source :
Earth & Space Science. Mar2022, Vol. 9 Issue 3, p1-15. 15p.
Publication Year :
2022

Abstract

Geological reports are frequently used by geologists involved in geological surveys and scientific research to record the results and outcomes of geological surveys. With such a rich data source, a substantial amount of knowledge has yet to be mined and analyzed. This paper focuses on automatically information extraction from geological reports, namely, geological named entity recognition. Geological named entity recognition has an important role in data mining, knowledge discovery and Knowledge graph construction. Existing general named entity recognition models/tools are limited in the domain of geoscience due to the various language irregularities associated with geological text, such as informal sentence structures, several domain‐geoscience words, large character lengths and multiple combinations of independent words. We present Bidirectional encoder representations from transformers (BERT)‐(Bidirectional gated recurrent unit network) BiGRU‐ (Conditional random field) CRF, which is a deep learning‐based geological named entity recognition model that is designed specifically with these linguistic irregularities in mind. Based on the pretrained language model, an integrated deep learning model incorporating BERT, BiGRU and CRF is constructed to obtain character vectors rich in semantic information through the BERT pretrained language model to alleviate for the lack of specificity of static word vectors (e.g., word2vec) and to improve the extraction capability of complex geological entities. We demonstrate our proposed model by applying it to four test datasets, including a geoscience NER data set from regional geological reports, and by comparing its performance with those of five baseline models. Plain Language Summary: Geological named entity recognition has an important role in information extraction and knowledge discovery. This paper presents BERT‐BiGRU‐CRF, which is a deep learning‐based geological named entity recognition model that is designed specifically with these linguistic irregularities in mind. We hope that our approach will serve as an alternative method that deserves further study. Key Points: A named entity recognition model based on BERT‐BiGRU‐CRF is proposed and a set of detailed experimental tests on domain data setOur model is compared in detail with four other mainstream models, and experiments demonstrate that our model obtains better performanceWe share the source code of BERT‐BiGRU‐CRF and the annotated test data [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
23335084
Volume :
9
Issue :
3
Database :
Academic Search Index
Journal :
Earth & Space Science
Publication Type :
Academic Journal
Accession number :
156006300
Full Text :
https://doi.org/10.1029/2021EA002166