Back to Search
Start Over
Chinese clinical named entity recognition with variant neural structures based on BERT methods
- Source :
- Journal of biomedical informatics. 107
- Publication Year :
- 2019
-
Abstract
- Clinical Named Entity Recognition (CNER) is a critical task which aims to identify and classify clinical terms in electronic medical records. In recent years, deep neural networks have achieved significant success in CNER. However, these methods require high-quality and large-scale labeled clinical data, which is challenging and expensive to obtain, especially data on Chinese clinical records. To tackle the Chinese CNER task, we pre-train BERT model on the unlabeled Chinese clinical records, which can leverage the unlabeled domain-specific knowledge. Different layers such as Long Short-Term Memory (LSTM) and Conditional Random Field (CRF) are used to extract the text features and decode the predicted tags respectively. In addition, we propose a new strategy to incorporate dictionary features into the model. Radical features of Chinese characters are used to improve the model performance as well. To the best of our knowledge, our ensemble model outperforms the state of the art models which achieves 89.56% strict F1 score on the CCKS-2018 dataset and 91.60% F1 score on CCKS-2017 dataset.
- Subjects :
- Conditional random field
China
Computer science
Health Informatics
computer.software_genre
03 medical and health sciences
0302 clinical medicine
Named-entity recognition
Leverage (statistics)
Electronic Health Records
030212 general & internal medicine
030304 developmental biology
0303 health sciences
Text Messaging
Ensemble forecasting
business.industry
Computer Science Applications
Deep neural networks
Artificial intelligence
Neural Networks, Computer
Chinese characters
F1 score
business
computer
Clinical record
Natural language processing
Subjects
Details
- ISSN :
- 15320480
- Volume :
- 107
- Database :
- OpenAIRE
- Journal :
- Journal of biomedical informatics
- Accession number :
- edsair.doi.dedup.....4b3fb66b7a7722620b35b961a84d8503