Back to Search Start Over

BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework.

Authors :
Zheng, Xiangwen
Du, Haijian
Luo, Xiaowei
Tong, Fan
Song, Wei
Zhao, Dongsheng
Source :
BMC Bioinformatics. 11/22/2022, Vol. 23 Issue 1, p1-19. 19p.
Publication Year :
2022

Abstract

Background: Automatic and accurate recognition of various biomedical named entities from literature is an important task of biomedical text mining, which is the foundation of extracting biomedical knowledge from unstructured texts into structured formats. Using the sequence labeling framework and deep neural networks to implement biomedical named entity recognition (BioNER) is a common method at present. However, the above method often underutilizes syntactic features such as dependencies and topology of sentences. Therefore, it is an urgent problem to be solved to integrate semantic and syntactic features into the BioNER model. Results: In this paper, we propose a novel biomedical named entity recognition model, named BioByGANS (BioBERT/SpaCy-Graph Attention Network-Softmax), which uses a graph to model the dependencies and topology of a sentence and formulate the BioNER task as a node classification problem. This formulation can introduce more topological features of language and no longer be only concerned about the distance between words in the sequence. First, we use periods to segment sentences and spaces and symbols to segment words. Second, contextual features are encoded by BioBERT, and syntactic features such as part of speeches, dependencies and topology are preprocessed by SpaCy respectively. A graph attention network is then used to generate a fusing representation considering both the contextual features and syntactic features. Last, a softmax function is used to calculate the probabilities and get the results. We conduct experiments on 8 benchmark datasets, and our proposed model outperforms existing BioNER state-of-the-art methods on the BC2GM, JNLPBA, BC4CHEMD, BC5CDR-chem, BC5CDR-disease, NCBI-disease, Species-800, and LINNAEUS datasets, and achieves F1-scores of 85.15%, 78.16%, 92.97%, 94.74%, 87.74%, 91.57%, 75.01%, 90.99%, respectively. Conclusion: The experimental results on 8 biomedical benchmark datasets demonstrate the effectiveness of our model, and indicate that formulating the BioNER task into a node classification problem and combining syntactic features into the graph attention networks can significantly improve model performance. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
14712105
Volume :
23
Issue :
1
Database :
Academic Search Index
Journal :
BMC Bioinformatics
Publication Type :
Academic Journal
Accession number :
160371176
Full Text :
https://doi.org/10.1186/s12859-022-05051-9