Back to Search
Start Over
Constructing fine-grained entity recognition corpora based on clinical records of traditional Chinese medicine
- Source :
- BMC Medical Informatics and Decision Making, BMC Medical Informatics and Decision Making, Vol 20, Iss 1, Pp 1-17 (2020)
- Publication Year :
- 2020
- Publisher :
- Springer Science and Business Media LLC, 2020.
-
Abstract
- Background In this study, we focus on building a fine-grained entity annotation corpus with the corresponding annotation guideline of traditional Chinese medicine (TCM) clinical records. Our aim is to provide a basis for the fine-grained corpus construction of TCM clinical records in future. Methods We developed a four-step approach that is suitable for the construction of TCM medical records in our corpus. First, we determined the entity types included in this study through sample annotation. Then, we drafted a fine-grained annotation guideline by summarizing the characteristics of the dataset and referring to some existing guidelines. We iteratively updated the guidelines until the inter-annotator agreement (IAA) exceeded a Cohen’s kappa value of 0.9. Comprehensive annotations were performed while keeping the IAA value above 0.9. Results We annotated the 10,197 clinical records in five rounds. Four entity categories involving 13 entity types were employed. The final fine-grained annotated entity corpus consists of 1104 entities and 67,799 tokens. The final IAAs are 0.936 on average (for three annotators), indicating that the fine-grained entity recognition corpus is of high quality. Conclusions These results will provide a foundation for future research on corpus construction and named entity recognition tasks in the TCM clinical domain.
- Subjects :
- Guideline development
Computer science
Health Informatics
Traditional Chinese medicine
lcsh:Computer applications to medicine. Medical informatics
computer.software_genre
Health informatics
Domain (software engineering)
03 medical and health sciences
Annotation
0302 clinical medicine
Named-entity recognition
030212 general & internal medicine
Medicine, Chinese Traditional
030304 developmental biology
Corpus construction
0303 health sciences
Kappa value
business.industry
Health Policy
Fine-grained annotation
Guideline
Computer Science Applications
Named entity recognition
TCM clinical records
lcsh:R858-859.7
Artificial intelligence
business
Clinical record
computer
Natural language processing
Research Article
Subjects
Details
- ISSN :
- 14726947
- Volume :
- 20
- Database :
- OpenAIRE
- Journal :
- BMC Medical Informatics and Decision Making
- Accession number :
- edsair.doi.dedup.....e344cbd61687f343ef3af6be5b63398c
- Full Text :
- https://doi.org/10.1186/s12911-020-1079-2