Back to Search Start Over

Filtering big data from social media--Building an early warning system for adverse drug reactions.

Authors :
Yang M
Kiang M
Shang W
Source :
Journal of biomedical informatics [J Biomed Inform] 2015 Apr; Vol. 54, pp. 230-40. Date of Electronic Publication: 2015 Feb 14.
Publication Year :
2015

Abstract

Objectives: Adverse drug reactions (ADRs) are believed to be a leading cause of death in the world. Pharmacovigilance systems are aimed at early detection of ADRs. With the popularity of social media, Web forums and discussion boards become important sources of data for consumers to share their drug use experience, as a result may provide useful information on drugs and their adverse reactions. In this study, we propose an automated ADR related posts filtering mechanism using text classification methods. In real-life settings, ADR related messages are highly distributed in social media, while non-ADR related messages are unspecific and topically diverse. It is expensive to manually label a large amount of ADR related messages (positive examples) and non-ADR related messages (negative examples) to train classification systems. To mitigate this challenge, we examine the use of a partially supervised learning classification method to automate the process.<br />Methods: We propose a novel pharmacovigilance system leveraging a Latent Dirichlet Allocation modeling module and a partially supervised classification approach. We select drugs with more than 500 threads of discussion, and collect all the original posts and comments of these drugs using an automatic Web spidering program as the text corpus. Various classifiers were trained by varying the number of positive examples and the number of topics. The trained classifiers were applied to 3000 posts published over 60 days. Top-ranked posts from each classifier were pooled and the resulting set of 300 posts was reviewed by a domain expert to evaluate the classifiers.<br />Results: Compare to the alternative approaches using supervised learning methods and three general purpose partially supervised learning methods, our approach performs significantly better in terms of precision, recall, and the F measure (the harmonic mean of precision and recall), based on a computational experiment using online discussion threads from Medhelp.<br />Conclusions: Our design provides satisfactory performance in identifying ADR related posts for post-marketing drug surveillance. The overall design of our system also points out a potentially fruitful direction for building other early warning systems that need to filter big data from social media networks.<br /> (Copyright © 2015 Elsevier Inc. All rights reserved.)

Details

Language :
English
ISSN :
1532-0480
Volume :
54
Database :
MEDLINE
Journal :
Journal of biomedical informatics
Publication Type :
Academic Journal
Accession number :
25688695
Full Text :
https://doi.org/10.1016/j.jbi.2015.01.011