23 results on '"Social media mining"'
Search Results
2. COVID-19 Event Extraction from Twitter via Extractive Question Answering with Continuous Prompts.
- Author
-
Jiang Y and Kavuluru R
- Subjects
- Humans, Benchmarking, Disclosure, Health Facilities, COVID-19, Social Media
- Abstract
As COVID-19 ravages the world, social media analytics could augment traditional surveys in assessing how the pandemic evolves and capturing consumer chatter that could help healthcare agencies in addressing it. This typically involves mining disclosure events that mention testing positive for the disease or discussions surrounding perceptions and beliefs in preventative or treatment options. The 2020 shared task on COVID-19 event extraction (conducted as part of the W-NUT workshop during the EMNLP conference) introduced a new Twitter dataset for benchmarking event extraction from COVID-19 tweets. In this paper, we cast the problem of event extraction as extractive question answering using recent advances in continuous prompting in language models. On the shared task test dataset, our approach leads to over 5% absolute micro-averaged F1-score improvement over prior best results, across all COVID-19 event slots. Our ablation study shows that continuous prompts have a major impact on the eventual performance.
- Published
- 2024
- Full Text
- View/download PDF
3. Social media-based urban disaster recovery and resilience analysis of the Henan deluge.
- Author
-
Shan S and Zhao F
- Abstract
Measuring disaster resilience from the perspective of long-term recovery ability is important for the planning and construction of urban sustainability, whereas short-term resilient recovery can better reflect a city's ability to recover quickly after a disaster occurs. This study proposes an analytical framework for urban disaster recovery and resilience based on social media data that can analyze short-term disaster recovery and assess disaster resilience from the perspectives of infrastructure and people's psychological states. We consider the downpour in Henan, China, in July 2021. The results show that (1) social media data can effectively reflect short-term disaster recovery, (2) disaster resilience can be assessed using social media data combined with rainfall and damage data, and (3) the framework can quantitatively reflect the differences in disaster recovery and resilience across regions. The findings can facilitate better decision-making in disaster emergency management for precise and effective post-disaster reconstruction and psychological intervention, and provide references for cities to improve disaster resilience., Competing Interests: Conflict of interestThe authors declare that we have no financial or personal relationships with other people or organizations that can inappropriately influence our work, and there is no professional or other personal interest of any nature or kind in any product, service, or company that could be construed as influencing the position presented in, or the review of, the manuscript., (© The Author(s), under exclusive licence to Springer Nature B.V. 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.)
- Published
- 2023
- Full Text
- View/download PDF
4. How do others cope? Extracting coping strategies for adverse drug events from social media.
- Author
-
Dirkson A, Verberne S, van Oortmerssen G, Gelderblom H, and Kraaij W
- Subjects
- Humans, Natural Language Processing, Social Media, Gastrointestinal Stromal Tumors, Drug-Related Side Effects and Adverse Reactions
- Abstract
Patients advise their peers on how to cope with their illness in daily life on online support groups. To date, no efforts have been made to automatically extract recommended coping strategies from online patient discussion groups. We introduce this new task, which poses a number of challenges including complex, long entities, a large long-tailed label space, and cross-document relations. We present an initial ontology for coping strategies as a starting point for future research on coping strategies, and the first end-to-end pipeline for extracting coping strategies for side effects. We also compared two possible computational solutions for this novel and highly challenging task; multi-label classification and named entity recognition (NER) with entity linking (EL). We evaluated our methods on the discussion forum from the Facebook group of the worldwide patient support organization 'GIST support international' (GSI); GIST support international donated the data to us. We found that coping strategy extraction is difficult and both methods attain limited performance (measured with F
1 score) on held out test sets; multi-label classification outperforms NER+EL (F1 =0.220 vs F1 =0.155). An inspection of the multi-label classification output revealed that for some of the incorrect predictions, the reference label is close to the predicted label in the ontology (e.g. the predicted label 'juice' instead of the more specific reference label 'grapefruit juice'). Performance increased to F1 =0.498 when we evaluated at a coarser level of the ontology. We conclude that our pipeline can be used in a semi-automatic setting, in interaction with domain experts to discover coping strategies for side effects from a patient forum. For example, we found that patients recommend ginger tea for nausea and magnesium and potassium supplements for cramps. This information can be used as input for patient surveys or clinical studies., Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper., (Copyright © 2022 The Author(s). Published by Elsevier Inc. All rights reserved.)- Published
- 2023
- Full Text
- View/download PDF
5. A survey on the use of association rules mining techniques in textual social media.
- Author
-
Diaz-Garcia JA, Ruiz MD, and Martin-Bautista MJ
- Abstract
The incursion of social media in our lives has been much accentuated in the last decade. This has led to a multiplication of data mining tools aimed at obtaining knowledge from these data sources. One of the greatest challenges in this area is to be able to obtain this knowledge without the need for training processes, which requires structured information and pre-labelled datasets. This is where unsupervised data mining techniques come in. These techniques can obtain value from these unstructured and unlabelled data, providing very interesting solutions to enhance the decision-making process. In this paper, we first address the problem of social media mining, as well as the need for unsupervised techniques, in particular association rules, for its treatment. We follow with a broad overview of the applications of association rules in the domain of social media mining, specifically, their application to the problems of mining textual entities, such as tweets. We also focus on the strengths and weaknesses of using association rules for solving different tasks in textual social media. Finally, the paper provides a perspective overview of the challenges that association rules must face in the next decade within the field of social media mining., (© The Author(s) 2022.)
- Published
- 2023
- Full Text
- View/download PDF
6. Social Media Mining of Long-COVID Self-Medication Reported by Reddit Users: Feasibility Study to Support Drug Repurposing.
- Author
-
Koss J and Bohnet-Joschko S
- Abstract
Background: Since the beginning of the COVID-19 pandemic, over 480 million people have been infected and more than 6 million people have died from COVID-19 worldwide. In some patients with acute COVID-19, symptoms manifest over a longer period, which is also called "long-COVID." Unmet medical needs related to long-COVID are high, since there are no treatments approved. Patients experiment with various medications and supplements hoping to alleviate their suffering. They often share their experiences on social media., Objective: The aim of this study was to explore the feasibility of social media mining methods to extract important compounds from the perspective of patients. The goal is to provide an overview of different medication strategies and important agents mentioned in Reddit users' self-reports to support hypothesis generation for drug repurposing, by incorporating patients' experiences., Methods: We used named-entity recognition to extract substances representing medications or supplements used to treat long-COVID from almost 70,000 posts on the "/r/covidlonghaulers" subreddit. We analyzed substances by frequency, co-occurrences, and network analysis to identify important substances and substance clusters., Results: The named-entity recognition algorithm achieved an F1 score of 0.67. A total of 28,447 substance entities and 5789 word co-occurrence pairs were extracted. "Histamine antagonists," "famotidine," "magnesium," "vitamins," and "steroids" were the most frequently mentioned substances. Network analysis revealed three clusters of substances, indicating certain medication patterns., Conclusions: This feasibility study indicates that network analysis can be used to characterize the medication strategies discussed in social media. Comparison with existing literature shows that this approach identifies substances that are promising candidates for drug repurposing, such as antihistamines, steroids, or antidepressants. In the context of a pandemic, the proposed method could be used to support drug repurposing hypothesis development by prioritizing substances that are important to users., (©Jonathan Koss, Sabine Bohnet-Joschko. Originally published in JMIR Formative Research (https://formative.jmir.org), 03.10.2022.)
- Published
- 2022
- Full Text
- View/download PDF
7. Symptoms reported by gastrointestinal stromal tumour (GIST) patients on imatinib treatment: combining questionnaire and forum data.
- Author
-
den Hollander D, Dirkson AR, Verberne S, Kraaij W, van Oortmerssen G, Gelderblom H, Oosten A, Reyners AKL, Steeghs N, van der Graaf WTA, Desar IME, and Husson O
- Subjects
- Cross-Sectional Studies, Fatigue epidemiology, Humans, Imatinib Mesylate adverse effects, Muscle Cramp, Protein Kinase Inhibitors adverse effects, Quality of Life, Surveys and Questionnaires, Gastrointestinal Stromal Tumors drug therapy
- Abstract
Purpose: Treatment with the tyrosine kinase inhibitor (TKI) imatinib in patients with gastrointestinal stromal tumours (GIST) causes symptoms that could negatively impact health-related quality of life (HRQoL). Treatment-related symptoms are usually clinician-reported and little is known about patient reports. We used survey and online patient forum data to investigate (1) prevalence of patient-reported symptoms; (2) coverage of symptoms mentioned on the forum by existing HRQoL questionnaires; and (3) priorities of prevalent symptoms in HRQoL assessment., Methods: In the cross-sectional population-based survey study, Dutch GIST patients completed items from the EORTC QLQ-C30 and Symptom-Based Questionnaire (SBQ). In the forum study, machine learning algorithms were used to extract TKI side-effects from English messages on an international online forum for GIST patients. Prevalence of symptoms related to imatinib treatment in both sources was calculated and exploratively compared., Results: Fatigue and muscle pain or cramps were reported most frequently. Seven out of 10 most reported symptoms (i.e. fatigue, muscle pain or cramps, facial swelling, joint pain, skin problems, diarrhoea, and oedema) overlapped between the two sources. Alopecia was frequently mentioned on the forum, but not in the survey. Four out of 10 most reported symptoms on the online forum are covered by the EORTC QLQ-C30. The EORTC-SBQ and EORTC Item Library cover 9 and 10 symptoms, respectively., Conclusion: This first overview of patient-reported imatinib-related symptoms from two data sources helps to determine coverage of items in existing questionnaires, and prioritize HRQoL issues. Combining cancer-generic instruments with treatment-specific item lists will improve future HRQoL assessment in care and research in GIST patients using TKI., (© 2022. The Author(s).)
- Published
- 2022
- Full Text
- View/download PDF
8. SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning.
- Author
-
Magge A, Weissenbacher D, O'Connor K, Scotch M, and Gonzalez-Hernandez G
- Abstract
The increase of social media usage across the globe has fueled efforts in digital epidemiology for mining valuable information such as medication use, adverse drug effects and reports of viral infections that directly and indirectly affect population health. Such specific information can, however, be scarce, hard to find, and mostly expressed in very colloquial language. In this work, we focus on a fundamental problem that enables social media mining for disease monitoring. We present and make available SEED, a natural language processing approach to detect symptom and disease mentions from social media data obtained from platforms such as Twitter and DailyStrength and to normalize them into UMLS terminology. Using multi-corpus training and deep learning models, the tool achieves an overall F1 score of 0.86 and 0.72 on DailyStrength and balanced Twitter datasets, significantly improving over previous approaches on the same datasets. We apply the tool on Twitter posts that report COVID19 symptoms, particularly to quantify whether the SEED system can extract symptoms absent in the training data. The study results also draw attention to the potential of multi-corpus training for performance improvements and the need for continuous training on newly obtained data for consistent performance amidst the ever-changing nature of the social media vocabulary.
- Published
- 2022
- Full Text
- View/download PDF
9. COVID-19 Surveiller: toward a robust and effective pandemic surveillance system basedon social media mining.
- Author
-
Jiang JY, Zhou Y, Chen X, Jhou YR, Zhao L, Liu S, Yang PC, Ahmar J, and Wang W
- Subjects
- Data Mining, Humans, Pandemics, SARS-CoV-2, COVID-19, Social Media
- Abstract
The outbreak of the novel coronavirus, COVID-19, has become one of the most severe pandemics in human history. In this paper, we propose to leverage social media users as social sensors to simultaneously predict the pandemic trends and suggest potential risk factors for public health experts to understand spread situations and recommend proper interventions. More precisely, we develop novel deep learning models to recognize important entities and their relations over time, thereby establishing dynamic heterogeneous graphs to describe the observations of social media users. A dynamic graph neural network model can then forecast the trends (e.g. newly diagnosed cases and death rates) and identify high-risk events from social media. Based on the proposed computational method, we also develop a web-based system for domain experts without any computer science background to easily interact with. We conduct extensive experiments on large-scale datasets of COVID-19 related tweets provided by Twitter, which show that our method can precisely predict the new cases and death rates. We also demonstrate the robustness of our web-based pandemic surveillance system and its ability to retrieve essential knowledge and derive accurate predictions across a variety of circumstances. Our system is also available at http://scaiweb.cs.ucla.edu/covidsurveiller/. This article is part of the theme issue 'Data science approachs to infectious disease surveillance'.
- Published
- 2022
- Full Text
- View/download PDF
10. Social media mining in drug development-Fundamentals and use cases.
- Author
-
Koss J, Rheinlaender A, Truebel H, and Bohnet-Joschko S
- Subjects
- Artificial Intelligence, Drug Discovery methods, Humans, Patient-Centered Care, Data Mining methods, Drug Development methods, Social Media
- Abstract
The incorporation of patients' perspectives into drug discovery and development has become critically important from the viewpoint of accounting for modern-day business dynamics. There is a trend among patients to narrate their disease experiences on social media. The insights gained by analyzing the data pertaining to such social-media posts could be leveraged to support patient-centered drug development. Manual analysis of these data is nearly impossible, but artificial intelligence enables automated and cost-effective processing, also referred as social media mining (SMM). This paper discusses the fundamental SMM methods along with several relevant drug-development use cases., (Copyright © 2021 The Authors. Published by Elsevier Ltd.. All rights reserved.)
- Published
- 2021
- Full Text
- View/download PDF
11. DeepADEMiner: a deep learning pharmacovigilance pipeline for extraction and normalization of adverse drug event mentions on Twitter.
- Author
-
Magge A, Tutubalina E, Miftahutdinov Z, Alimova I, Dirkson A, Verberne S, Weissenbacher D, and Gonzalez-Hernandez G
- Subjects
- Humans, Pharmacovigilance, Deep Learning, Drug-Related Side Effects and Adverse Reactions, Social Media
- Abstract
Objective: Research on pharmacovigilance from social media data has focused on mining adverse drug events (ADEs) using annotated datasets, with publications generally focusing on 1 of 3 tasks: ADE classification, named entity recognition for identifying the span of ADE mentions, and ADE mention normalization to standardized terminologies. While the common goal of such systems is to detect ADE signals that can be used to inform public policy, it has been impeded largely by limited end-to-end solutions for large-scale analysis of social media reports for different drugs., Materials and Methods: We present a dataset for training and evaluation of ADE pipelines where the ADE distribution is closer to the average 'natural balance' with ADEs present in about 7% of the tweets. The deep learning architecture involves an ADE extraction pipeline with individual components for all 3 tasks., Results: The system presented achieved state-of-the-art performance on comparable datasets and scored a classification performance of F1 = 0.63, span extraction performance of F1 = 0.44 and an end-to-end entity resolution performance of F1 = 0.34 on the presented dataset., Discussion: The performance of the models continues to highlight multiple challenges when deploying pharmacovigilance systems that use social media data. We discuss the implications of such models in the downstream tasks of signal detection and suggest future enhancements., Conclusion: Mining ADEs from Twitter posts using a pipeline architecture requires the different components to be trained and tuned based on input data imbalance in order to ensure optimal performance on the end-to-end resolution task., (© The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association.)
- Published
- 2021
- Full Text
- View/download PDF
12. An analysis of COVID-19 economic measures and attitudes: evidence from social media mining.
- Author
-
Domalewska D
- Abstract
This paper explores the public perception of economic measures implemented as a reaction to the COVID-19 pandemic in Poland in March-June 2020. A mixed-method approach was used to analyse big data coming from tweets and Facebook posts related to the mitigation measures to provide evidence for longitudinal trends, correlations, theme classification and perception. The online discussion oscillated around political and economic issues. The implementation of the anti-crisis measures triggered a barrage of criticism pointing out the shortcomings and ineffectiveness of the solutions. The revised relief legislation was accompanied by a wide-reaching informative campaign about the relief package, which decreased negative sentiment. The analysis also showed that with regard to online discussion about risk mitigation, social media users are more concerned about short-term economic and social effects rather than long-term effects of the pandemic. The findings have significant implications for the understanding of public sentiment related to the COVID-19 pandemic, economic attitudes and relief support implemented to fight the adverse effects of the pandemic., Competing Interests: Competing interestsNo potential competing interests was reported by the authors., (© The Author(s) 2021.)
- Published
- 2021
- Full Text
- View/download PDF
13. Learning structured medical information from social media.
- Author
-
Hasan A, Levene M, and Weston D
- Subjects
- Algorithms, Humans, Language, Supervised Machine Learning, Social Media
- Abstract
Our goal is to summarise and aggregate information from social media regarding the symptoms of a disease, the drugs used and the treatment effects both positive and negative. To achieve this we first apply a supervised machine learning method to automatically extract medical concepts from natural language text. In an environment such as social media, where new data is continuously streamed, we need a methodology that will allow us to continuously train with the new data. To attain such incremental re-training, a semi-supervised methodology is developed, which is capable of learning new concepts from a small set of labelled data together with the much larger set of unlabelled data. The semi-supervised methodology deploys a conditional random field (CRF) as the base-line training algorithm for extracting medical concepts. The methodology iteratively augments to the training set sentences having high confidence, and adds terms to existing dictionaries to be used as features with the base-line model for further classification. Our empirical results show that the base-line CRF performs strongly across a range of different dictionary and training sizes; when the base-line is built with the full training data the F
1 score reaches the range 84%-90%. Moreover, we show that the semi-supervised method produces a mild but significant improvement over the base-line. We also discuss the significance of the potential improvement of the semi-supervised methodology and found that it is significantly more accurate in most cases than the underlying base-line model., (Crown Copyright © 2020. Published by Elsevier Inc. All rights reserved.)- Published
- 2020
- Full Text
- View/download PDF
14. A proof-of-concept study of extracting patient histories for rare/intractable diseases from social media.
- Author
-
Yamaguchi A and Queralt-Rosinach N
- Abstract
The amount of content on social media platforms such as Twitter is expanding rapidly. Simultaneously, the lack of patient information seriously hinders the diagnosis and treatment of rare/intractable diseases. However, these patient communities are especially active on social media. Data from social media could serve as a source of patient-centric knowledge for these diseases complementary to the information collected in clinical settings and patient registries, and may also have potential for research use. To explore this question, we attempted to extract patient-centric knowledge from social media as a task for the 3-day Biomedical Linked Annotation Hackathon 6 (BLAH6). We selected amyotrophic lateral sclerosis and multiple sclerosis as use cases of rare and intractable diseases, respectively, and we extracted patient histories related to these health conditions from Twitter. Four diagnosed patients for each disease were selected. From the user timelines of these eight patients, we extracted tweets that might be related to health conditions. Based on our experiment, we show that our approach has considerable potential, although we identified problems that should be addressed in future attempts to mine information about rare/intractable diseases from Twitter.
- Published
- 2020
- Full Text
- View/download PDF
15. An empirical evaluation of electronic annotation tools for Twitter data.
- Author
-
Weissenbacher D, O'Connor K, Hiraki AT, Kim JD, and Gonzalez-Hernandez G
- Abstract
Despite a growing number of natural language processing shared-tasks dedicated to the use of Twitter data, there is currently no ad-hoc annotation tool for the purpose. During the 6th edition of BLAH, after a short review of 19 generic annotation tools, we adapted GATE and TextAE for annotating Twitter timelines. Although none of the tools reviewed allow the annotation of all information inherent of Twitter timelines, a few may be suitable provided the willingness by annotators to compromise on some functionality.
- Published
- 2020
- Full Text
- View/download PDF
16. Social media mining for smart cities and smart villages research.
- Author
-
Lytras MD, Visvizi A, and Jussila J
- Abstract
The imperative of well-being and improved quality of life in smart cities context can only be attained if the smart services, so central to the concept of smart cities, correspond with the needs, expectations and skills of cities' inhabitants. Considering that social media generate and/or open real-time entry points to vast amounts of data pertinent to well-being and quality of life, such as citizens' expectations, opinions, as well as to recent developments related to regulatory frameworks, debates, political decisions and policymaking, the big question is how to exploit the potential inherent in social media and use it to enhance the value added smart cities generate. Social mining is traditionally understood as the process of representing, analyzing, and extracting actionable patterns and trends from raw social media data. In the context of smart cities, this special issue focuses on how social media data, also potentially combined with other data, can be used to optimize the efficiency of city operations and services, and thereby contribute more efficiently to citizens' well-being and quality of life., Competing Interests: Conflict of interestThe authors declare that they have no conflict of interest., (© Springer-Verlag GmbH Germany, part of Springer Nature 2020.)
- Published
- 2020
- Full Text
- View/download PDF
17. Social media mining for birth defects research: A rule-based, bootstrapping approach to collecting data for rare health-related events on Twitter.
- Author
-
Klein AZ, Sarker A, Cai H, Weissenbacher D, and Gonzalez-Hernandez G
- Subjects
- Algorithms, Congenital Abnormalities epidemiology, Europe, False Positive Reactions, Female, Georgia, Humans, Illinois, Infant, Infant, Newborn, International Classification of Diseases, Machine Learning, Male, Natural Language Processing, Pregnancy, Reproducibility of Results, Unified Medical Language System, United States, Congenital Abnormalities diagnosis, Data Collection methods, Data Mining methods, Heart Defects, Congenital diagnosis, Social Media
- Abstract
Background: Although birth defects are the leading cause of infant mortality in the United States, methods for observing human pregnancies with birth defect outcomes are limited., Objective: The primary objectives of this study were (i) to assess whether rare health-related events-in this case, birth defects-are reported on social media, (ii) to design and deploy a natural language processing (NLP) approach for collecting such sparse data from social media, and (iii) to utilize the collected data to discover a cohort of women whose pregnancies with birth defect outcomes could be observed on social media for epidemiological analysis., Methods: To assess whether birth defects are mentioned on social media, we mined 432 million tweets posted by 112,647 users who were automatically identified via their public announcements of pregnancy on Twitter. To retrieve tweets that mention birth defects, we developed a rule-based, bootstrapping approach, which relies on a lexicon, lexical variants generated from the lexicon entries, regular expressions, post-processing, and manual analysis guided by distributional properties. To identify users whose pregnancies with birth defect outcomes could be observed for epidemiological analysis, inclusion criteria were (i) tweets indicating that the user's child has a birth defect, and (ii) accessibility to the user's tweets during pregnancy. We conducted a semi-automatic evaluation to estimate the recall of the tweet-collection approach, and performed a preliminary assessment of the prevalence of selected birth defects among the pregnancy cohort derived from Twitter., Results: We manually annotated 16,822 retrieved tweets, distinguishing tweets indicating that the user's child has a birth defect (true positives) from tweets that merely mention birth defects (false positives). Inter-annotator agreement was substantial: κ = 0.79 (Cohen's kappa). Analyzing the timelines of the 646 users whose tweets were true positives resulted in the discovery of 195 users that met the inclusion criteria. Congenital heart defects are the most common type of birth defect reported on Twitter, consistent with findings in the general population. Based on an evaluation of 4169 tweets retrieved using alternative text mining methods, the recall of the tweet-collection approach was 0.95., Conclusions: Our contributions include (i) evidence that rare health-related events are indeed reported on Twitter, (ii) a generalizable, systematic NLP approach for collecting sparse tweets, (iii) a semi-automatic method to identify undetected tweets (false negatives), and (iv) a collection of publicly available tweets by pregnant users with birth defect outcomes, which could be used for future epidemiological analysis. In future work, the annotated tweets could be used to train machine learning algorithms to automatically identify users reporting birth defect outcomes, enabling the large-scale use of social media mining as a complementary method for such epidemiological research., (Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.)
- Published
- 2018
- Full Text
- View/download PDF
18. Social media engagement analysis of U.S. Federal health agencies on Facebook.
- Author
-
Bhattacharya S, Srinivasan P, and Polgreen P
- Subjects
- Communication, Humans, Models, Statistical, Patient Acceptance of Health Care, United States, Information Dissemination, Information Seeking Behavior, Social Media, Social Networking, United States Dept. of Health and Human Services
- Abstract
Background: It is becoming increasingly common for individuals and organizations to use social media platforms such as Facebook. These are being used for a wide variety of purposes including disseminating, discussing and seeking health related information. U.S. Federal health agencies are leveraging these platforms to 'engage' social media users to read, spread, promote and encourage health related discussions. However, different agencies and their communications get varying levels of engagement. In this study we use statistical models to identify factors that associate with engagement., Methods: We analyze over 45,000 Facebook posts from 72 Facebook accounts belonging to 24 health agencies. Account usage, user activity, sentiment and content of these posts are studied. We use the hurdle regression model to identify factors associated with the level of engagement and Cox proportional hazards model to identify factors associated with duration of engagement., Results: In our analysis we find that agencies and accounts vary widely in their usage of social media and activity they generate. Statistical analysis shows, for instance, that Facebook posts with more visual cues such as photos or videos or those which express positive sentiment generate more engagement. We further find that posts on certain topics such as occupation or organizations negatively affect the duration of engagement., Conclusions: We present the first comprehensive analyses of engagement with U.S. Federal health agencies on Facebook. In addition, we briefly compare and contrast findings from this study to our earlier study with similar focus but on Twitter to show the robustness of our methods.
- Published
- 2017
- Full Text
- View/download PDF
19. YouTube as a crowd-generated water level archive.
- Author
-
Michelsen N, Dirks H, Schulz S, Kempe S, Al-Saud M, and Schüth C
- Abstract
In view of the substantial costs associated with classic monitoring networks, participatory data collection methods can be deemed a promising option to obtain complementary data. An emerging trend in this field is social media mining, i.e., harvesting of pre-existing, crowd-generated data from social media. Although this approach is participatory in a broader sense, the users are mostly not aware of their participation in research. Inspired by this novel development, we demonstrate in this study that it is possible to derive a water level time series from the analysis of multiple YouTube videos. As an example, we studied the recent water level rise in Dahl Hith, a Saudi Arabian cave. To do so, we screened 16 YouTube videos of the cave for suitable reference points (e.g., cave graffiti). Then, we visually estimated the distances between these points and the water level and traced their changes over time. To bridge YouTube hiatuses, we considered own photos taken during two site visits. For the time period 2013-2014, we estimate a rise of 9.5m. The fact that this rise occurred at a somewhat constant rate of roughly 0.4m per month points towards a new and permanent water source, possibly two nearby lakes formed from treated sewage effluent. An anomaly in the rising rate is noted for autumn 2013 (1.3m per month). As this increased pace coincides with a cluster of rain events, we deem rapid groundwater recharge along preferential flow paths a likely cause. Despite the sacrifice in precision, we believe that YouTube harvesting may represent a viable option to gather historical water levels in data-scarce settings and that it could be adapted to other environments (e.g., flood extents). In certain areas, it might provide an additional tool for the monitoring toolbox, thereby possibly delivering hydrological data for water resources management., (Copyright © 2016 Elsevier B.V. All rights reserved.)
- Published
- 2016
- Full Text
- View/download PDF
20. A framework for detecting unfolding emergencies using humans as sensors.
- Author
-
Avvenuti M, Cimino MG, Cresci S, Marchetti A, and Tesconi M
- Abstract
The advent of online social networks (OSNs) paired with the ubiquitous proliferation of smartphones have enabled social sensing systems. In the last few years, the aptitude of humans to spontaneously collect and timely share context information has been exploited for emergency detection and crisis management. Apart from event-specific features, these systems share technical approaches and architectural solutions to address the issues with capturing, filtering and extracting meaningful information from data posted to OSNs by networks of human sensors. This paper proposes a conceptual and architectural framework for the design of emergency detection systems based on the "human as a sensor" (HaaS) paradigm. An ontology for the HaaS paradigm in the context of emergency detection is defined. Then, a modular architecture, independent of a specific emergency type, is designed. The proposed architecture is demonstrated by an implemented application for detecting earthquakes via Twitter. Validation and experimental results based on messages posted during earthquakes occurred in Italy are reported.
- Published
- 2016
- Full Text
- View/download PDF
21. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features.
- Author
-
Nikfarjam A, Sarker A, O'Connor K, Ginn R, and Gonzalez G
- Subjects
- Humans, Natural Language Processing, Semantics, Artificial Intelligence, Data Mining methods, Pharmacovigilance, Social Media
- Abstract
Objective: Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the language in social media is highly informal, and user-expressed medical concepts are often nontechnical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and thus far, advanced machine learning-based NLP techniques have been underutilized. Our objective is to design a machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media., Methods: We introduce ADRMine, a machine learning-based concept extraction system that uses conditional random fields (CRFs). ADRMine utilizes a variety of features, including a novel feature for modeling words' semantic similarities. The similarities are modeled by clustering words based on unsupervised, pretrained word representation vectors (embeddings) generated from unlabeled user posts in social media using a deep learning technique., Results: ADRMine outperforms several strong baseline systems in the ADR extraction task by achieving an F-measure of 0.82. Feature analysis demonstrates that the proposed word cluster features significantly improve extraction performance., Conclusion: It is possible to extract complex medical concepts, with relatively high performance, from informal, user-generated content. Our approach is particularly scalable, suitable for social media mining, as it relies on large volumes of unlabeled data, thus diminishing the need for large, annotated training data sets., (© The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.)
- Published
- 2015
- Full Text
- View/download PDF
22. Filtering big data from social media--Building an early warning system for adverse drug reactions.
- Author
-
Yang M, Kiang M, and Shang W
- Subjects
- Humans, Data Mining methods, Drug-Related Side Effects and Adverse Reactions prevention & control, Internet, Pharmacovigilance, Social Media
- Abstract
Objectives: Adverse drug reactions (ADRs) are believed to be a leading cause of death in the world. Pharmacovigilance systems are aimed at early detection of ADRs. With the popularity of social media, Web forums and discussion boards become important sources of data for consumers to share their drug use experience, as a result may provide useful information on drugs and their adverse reactions. In this study, we propose an automated ADR related posts filtering mechanism using text classification methods. In real-life settings, ADR related messages are highly distributed in social media, while non-ADR related messages are unspecific and topically diverse. It is expensive to manually label a large amount of ADR related messages (positive examples) and non-ADR related messages (negative examples) to train classification systems. To mitigate this challenge, we examine the use of a partially supervised learning classification method to automate the process., Methods: We propose a novel pharmacovigilance system leveraging a Latent Dirichlet Allocation modeling module and a partially supervised classification approach. We select drugs with more than 500 threads of discussion, and collect all the original posts and comments of these drugs using an automatic Web spidering program as the text corpus. Various classifiers were trained by varying the number of positive examples and the number of topics. The trained classifiers were applied to 3000 posts published over 60 days. Top-ranked posts from each classifier were pooled and the resulting set of 300 posts was reviewed by a domain expert to evaluate the classifiers., Results: Compare to the alternative approaches using supervised learning methods and three general purpose partially supervised learning methods, our approach performs significantly better in terms of precision, recall, and the F measure (the harmonic mean of precision and recall), based on a computational experiment using online discussion threads from Medhelp., Conclusions: Our design provides satisfactory performance in identifying ADR related posts for post-marketing drug surveillance. The overall design of our system also points out a potentially fruitful direction for building other early warning systems that need to filter big data from social media networks., (Copyright © 2015 Elsevier Inc. All rights reserved.)
- Published
- 2015
- Full Text
- View/download PDF
23. #nowplaying Madonna: a large-scale evaluation on estimating similarities between music artists and between movies from microblogs.
- Author
-
Schedl M
- Abstract
Different term weighting techniques such as [Formula: see text] or BM25 have been used intensely for manifold text-based information retrieval tasks. Their use for modeling term profiles for named entities and subsequent calculation of similarities between these named entities have been studied to a much smaller extent. The recent trend of microblogging made available massive amounts of information about almost every topic around the world. Therefore, microblogs represent a valuable source for text-based named entity modeling. In this paper, we present a systematic and comprehensive evaluation of different term weighting measures , normalization techniques , query schemes , index term sets , and similarity functions for the task of inferring similarities between named entities, based on data extracted from microblog posts . We analyze several thousand combinations of choices for the above mentioned dimensions, which influence the similarity calculation process, and we investigate in which way they impact the quality of the similarity estimates. Evaluation is performed using three real-world data sets: two collections of microblogs related to music artists and one related to movies. For the music collections, we present results of genre classification experiments using as benchmark genre information from allmusic.com. For the movie collection, we present results of multi-class classification experiments using as benchmark categories from IMDb. We show that microblogs can indeed be exploited to model named entity similarity with remarkable accuracy, provided the correct settings for the analyzed aspects are used. We further compare the results to those obtained when using Web pages as data source.
- Published
- 2012
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.