Back to Search Start Over

Text Recognition of Al-Si Alloy Literature Based on Transfer Learning.

Authors :
LIU Yingli
LI Wuliang
NIU Chen
YAO Changhui
YIN Jiancheng
SHEN Tao
Source :
Silicone Material; Aug2022, Vol. 36 Issue 4, p640-667, 7p
Publication Year :
2022

Abstract

In recent years, Material Genome Initiative (MGI) has become a global hot spot. The lack of data sources and irregular data storage methods have led to a lack of structured data that can be used for machine learning model training in the materials field, which has become a bottleneck for researchers in predicting material performance. With the continuous development of materials science, a large amount of information contained in the materials field text has become the focus of attention for researchers, and has become the main data sources for materials field personnel to apply machine learning. How to obtain a large amount of effective materials data is a new challenge at this stage. This article uses natural language processing technology to obtain valid data from the aluminum-silicon alloy materials literature. Named entity recognition is an important subtask in natural language processing, which aims to identify entities with meaning in text. In this paper, five types of entities are selected from the material science literature, and an aluminum-silicon alloy material entity recognition data set is constructed by hand annotation, which includes 5 347 sentences and 2 835 entities. In order to reduce the dependence of natural language processing tasks on annotation expectations, transfer learning is used to pre-train the language model and apply it to specific domain tasks. Combining entity characteristics, joint modeling is carried out based on ALBERT (A lite BERT) pre-training language model and conditional random fields (CRF), and the pre-training model is applied to alloy material entity recognition based on active learning. Based on a small number of labeled training set samples, combined with active learning, the F1 value, accuracy rate, and recall rate of the model are increased by 0.61%, 2.68%, and 0.29%, respectively. Experiments prove that combining pre-training and active learning can further reduce the dependence of entity recognition task models on labeled data and the cost of manual labeling. The research results of this paper can solve the problem of material data islands and improve the problem of material genome machine learning, which has been in the dilemma of small-scale data sets. It will promote the development of aluminum-silicon alloys and provide a scientific basis for the design of new materials for material genomes. [ABSTRACT FROM AUTHOR]

Details

Language :
Chinese
ISSN :
10094369
Volume :
36
Issue :
4
Database :
Complementary Index
Journal :
Silicone Material
Publication Type :
Academic Journal
Accession number :
159134261
Full Text :
https://doi.org/10.14136/j.cnki.issn1673-2812.2022.04.015