1. ASaRE-Net: automatic information extraction from Al-Si alloy materials science literature for corpus construction.
- Author
-
Liu, Yingli, Wen, Shaojie, Yin, Jiancheng, and Zhou, Haihe
- Subjects
NATURAL language processing ,SCIENTIFIC literature ,KNOWLEDGE graphs ,KNOWLEDGE representation (Information theory) ,DATA mining - Abstract
The scientific literature on Al-Si alloy materials is a rich source of factual information about various classes of entities (e.g., alloys, properties, and compositions) as well as various relationships between these entities. Automatically extracting this information using natural language processing (NLP) techniques to generate a knowledge graph for Al-Si alloy materials is a challenging task. In this paper, we propose ASaRE-Net (Al-Si alloy Relationship Extraction Network) to extract entities and relationships from Chinese scientific literature on Al-Si alloy materials, thus forming a triple (Entity 1, Relation, Entity 2) as the basic unit for constructing a knowledge graph. Due to the complexity of overlapping triples and many triples in a text sentence, this paper designs a triple-aware module based on knowledge representation learning, which is used to mine the correlation between entities and relations, and uses a simple and efficient labeling strategy to decode, so as to alleviate the problem of cascading errors. This paper defines 11 types of entity and 13 types of relationship for Al-Si alloy materials, and constructs ASaIED (Al-Si alloy Information Extraction Dataset). Experimental results show that compared to the strongest baseline, the proposed model improves the F1 score by 2.34% on the ASaIED dataset, and demonstrates better performance in scenarios involving SEO overlapping types and many triples. The proposed model can continuously provide credible data for constructing knowledge graphs in the field of Al-Si alloy materials. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF