Back to Search
Start Over
SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning
- Source :
- medRxiv, article-version (status) pre, article-version (number) 1
- Publication Year :
- 2021
- Publisher :
- Cold Spring Harbor Laboratory, 2021.
-
Abstract
- The increase of social media usage across the globe has fueled efforts in public health research for mining valuable information such as medication use, adverse drug effects and reports of viral infections that directly and indirectly affect human health. Despite its significance, such information can be incredibly rare on social media. Mining such non-traditional sources for disease monitoring requires natural language processing techniques for extracting symptom mentions and normalizing them to standard terminologies for interpretability. In this work, we present the first version of a social media mining tool called SEED that detects symptom and disease mentions from social media posts such as Twitter and DailyStrength and further normalizes them into the UMLS terminology. Using multi-corpus training and deep learning models, the tool achieves an overall F1 score of 0.85 for extracting mentions of symptoms on a health forum dataset and an F1 score of 0.72 on a balanced Twitter dataset significantly improving over previously systems on the datasets. We apply the tool on recently collected Twitter posts that self-report COVID19 symptoms to observe if the SEED system can extract novel diseases and symptoms that were absent in the training data. By doing so, we describe the advantages and shortcomings of the tool and suggest techniques to overcome the limitations. The study results also draw attention to the potential of multi-corpus training for performance improvements and the need for continual training on newly obtained data for consistent performance amidst the ever-changing nature of the social media vocabulary.
- Subjects :
- Vocabulary
Computer science
business.industry
media_common.quotation_subject
Deep learning
Unified Medical Language System
Population health
Social Media Mining
Data science
Article
Terminology
Pharmacovigilance
Deep Learning
Social media mining
Social media
Artificial intelligence
Transfer of learning
business
Information Extraction
Natural Language Processing
media_common
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- medRxiv, article-version (status) pre, article-version (number) 1
- Accession number :
- edsair.doi.dedup.....bdd316d05e210936a1384933e6c1abc7
- Full Text :
- https://doi.org/10.1101/2021.02.09.21251454