Back to Search
Start Over
Domain-Specific Chinese Word Segmentation Based on Bi-Directional Long-Short Term Memory Model
- Source :
- IEEE Access, Vol 7, Pp 12993-13002 (2019)
- Publication Year :
- 2019
- Publisher :
- IEEE, 2019.
-
Abstract
- Most of the current word segmentation methods are rule-based and traditional machine learning methods. Universal word segmentation tools do not work well in the field such as metallurgy. Domain-specific Chinese word segmentation is rarely studied. In recent years, with the development of deep learning, the neural network has been proved to be effective in Chinese word segmentation. However, this promising performance relies on large-scale training data. Neural networks with conventional architectures cannot achieve the desired results in low-resource datasets due to the lack of labeled training data. This paper takes the field of metallurgy as an example and proposes a domain-specific Chinese word segmentation based on Bi-directional long-short term memory (Bi-directional LSTM) model in the metallurgical field. First, the word segmentation model is obtained by using the Bi-directional LSTM model to train the internal and external domain knowledge. Then, a series of tuning parameters are carried out and the label probability of the word is combined with the weight. Finally, the result of word segmentation is obtained by label inference layer. The experimental results show that the proposed method can create a better word segmentation effect in the field of metallurgy.
- Subjects :
- General Computer Science
Computer science
Inference
0102 computer and information sciences
02 engineering and technology
01 natural sciences
Field (computer science)
0202 electrical engineering, electronic engineering, information engineering
General Materials Science
Segmentation
domain-specific
Artificial neural network
business.industry
Deep learning
Text segmentation
General Engineering
Pattern recognition
combination of weight
Bi-directional long-short term memory (Bi-directional LSTM) model
010201 computation theory & mathematics
Domain knowledge
020201 artificial intelligence & image processing
Artificial intelligence
lcsh:Electrical engineering. Electronics. Nuclear engineering
business
Chinese word segmentation
lcsh:TK1-9971
Word (computer architecture)
Subjects
Details
- Language :
- English
- ISSN :
- 21693536
- Volume :
- 7
- Database :
- OpenAIRE
- Journal :
- IEEE Access
- Accession number :
- edsair.doi.dedup.....7dba26a3835fbca2402d799b7169b9c7