1. 面向知识图谱构建的水产动物疾病诊治命名实体识别.
- Author
-
刘巨升 1,2,3,, 惠宁, 孙哲涛, 杨 鹤, 邵立铭, 于 红, 张思佳, and 叶仕根
- Abstract
Disease diagnosis and treatment have been an important support for aquatic animal health in aquaculture. A knowledge graph can be an effective way to express and apply the knowledge on the aquatic animal disease diagnosis and treatment. Among them, the named entity recognition has been the key component to construct the knowledge graph of aquatic animal diseases, particularly on the polysemy and entity nesting. However, the low recognition accuracy of named entities has posed a great challenge to the diagnosis and treatment of aquatic animal diseases. In this study, a diagnosis and treatment of aquatic animal diseases named entity recognition was proposed using BERT+CaBiLSTM+CRF (Bidirectional Encoder Representations from Transformers+Cascade-Bi-directional Long Short-Term Memory+Conditional Random Field). Firstly, the feature of the BERT model contained the position vector information. The polysemy was effectively improved to distinguish the different meanings that were expressed by entities in different contexts. Secondly, the CaBiLSTM model was designed for the nested named entity recognition using “hierarchical thinking”. The reason was that the inner entity in the nested entity of aquatic medicine greatly contributed to the recognition of the outer entity. First of all, the BiLSTM+CRF model was used to identify the inner entities that appeared frequently, and then the dimension reduction of the identified inner entity feature matrix was connected outer entity feature matrix to retain the complete inner entity feature information. After that, the BiLSTM+CRF model was used for the outer entity recognition to improve the discrimination of outer entities for the accurate recognition of outer entities. Finally, a comparative experiment was designed to verify the effectiveness of the proposed recognition. The test results show that the accuracy, recall, and F1 value of the named entity recognition task in the aquatic medicine using the BERT+CaBiLSTM+CRF model reached 93.07%, 92.85%, and 92.96%, respectively. The entity structure features were outstanding in terms of specific entity categories, due to the five types of non-nested entities, such as aquatic animal names, drug names, disease names, disease sites, and pathogens. For example, most aquatic animal names contained the radicals, such as “worm” and “fish”. The radicalsand drug names were mostly composed of chemical elements, while the disease names were mostly ended with the word “disease”, indicating a higher recognition accuracy than that in the nested entities. But in view of the outstanding nested structure of entities, the model performed better to identify the nested named entities, such as the clinical symptoms using the named entity recognition model integrating the BERT and CaBiLSTM designed by the “hierarchical idea”. Higher recognition was achieved than before. The recognition accuracy, recall, and F1 value increased by 12.31, 12.76, and 12.53 percentage points, respectively. Therefore, the model can be expected to effectively improve the accuracy of entity recognition caused by ambiguity and entity nesting in the task of diagnosis and treatment of aquatic animal diseases named entity recognition. The finding can provide the potential support to construct the fisheries field knowledge graph, further promote the healthy aquaculture projects. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF