Back to Search Start Over

SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning

Authors :
Arjun Magge
Davy Weissenbacher
Karen O’Connor
Matthew Scotch
Graciela Gonzalez-Hernandez
Source :
medRxiv, article-version (status) pre, article-version (number) 1
Publication Year :
2021
Publisher :
Cold Spring Harbor Laboratory, 2021.

Abstract

The increase of social media usage across the globe has fueled efforts in public health research for mining valuable information such as medication use, adverse drug effects and reports of viral infections that directly and indirectly affect human health. Despite its significance, such information can be incredibly rare on social media. Mining such non-traditional sources for disease monitoring requires natural language processing techniques for extracting symptom mentions and normalizing them to standard terminologies for interpretability. In this work, we present the first version of a social media mining tool called SEED that detects symptom and disease mentions from social media posts such as Twitter and DailyStrength and further normalizes them into the UMLS terminology. Using multi-corpus training and deep learning models, the tool achieves an overall F1 score of 0.85 for extracting mentions of symptoms on a health forum dataset and an F1 score of 0.72 on a balanced Twitter dataset significantly improving over previously systems on the datasets. We apply the tool on recently collected Twitter posts that self-report COVID19 symptoms to observe if the SEED system can extract novel diseases and symptoms that were absent in the training data. By doing so, we describe the advantages and shortcomings of the tool and suggest techniques to overcome the limitations. The study results also draw attention to the potential of multi-corpus training for performance improvements and the need for continual training on newly obtained data for consistent performance amidst the ever-changing nature of the social media vocabulary.

Details

Database :
OpenAIRE
Journal :
medRxiv, article-version (status) pre, article-version (number) 1
Accession number :
edsair.doi.dedup.....bdd316d05e210936a1384933e6c1abc7
Full Text :
https://doi.org/10.1101/2021.02.09.21251454