1. An Exploratory Study on Pseudo-Data Generation in Prescription and Adverse Drug Reaction Extraction.
- Author
-
Carson Tao, Kahyun Lee, Filannino, Michele, and Uzuner, Özlem
- Subjects
DRUG side effects ,DATA mining ,NATURAL language processing ,KNOWLEDGE base ,INFORMATION retrieval - Abstract
Prescription information and adverse drug reactions (ADR) are two components of detailed medication instructions that can benefit many aspects of clinical research. Automatic extraction of this information from free-text narratives via Information Extraction (IE) can open it up to downstream uses. IE is commonly tackled by supervised Natural Language Processing (NLP) systems which rely on annotated training data. However, training data generation is manual, timeconsuming, and labor-intensive. It is desirable to develop automatic methods for augmenting manually labeled data. We propose pseudo-data generation as one such automatic method. Pseudo-data are synthetic data generated by combining elements of existing labeled data. We propose and evaluate two sets of pseudo-data generation methods: knowledge-driven methods based on gazetteers and datadriven methods based on deep learning. We use the resulting pseudo-data to improve medication and ADR extraction. Datadriven pseudo-data are suitable for concept categories with high semantic regularities and short textual spans. Knowledgedriven pseudo-data are effective for concept categories with longer textual spans, assuming the knowledge base offers good coverage of these concepts. Combining the knowledge- and data-driven pseudo-data achieves significant performance improvement on medication names and ADRs over baselines limited to the use of available labeled data. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF