Language: english / Publication Year Range: Last 50 years / Publisher: springer nature / Search Limiters: Available in Library Collection and Full Text / Topic: natural language processing and social media - Searchworks@Jio Institute Digital Library Search Results

Showing total 37 results

Start Over Search Limiters Available in Library Collection Search Limiters Full Text Topic natural language processing Topic social media Publication Year Range Last 50 years Language english Publisher springer nature

37 results

1. Analyzing Russia's propaganda tactics on Twitter using mixed methods network analysis and natural language processing: a case study of the 2022 invasion of Ukraine.

Author: Alieva, Iuliia, Kloo, Ian, and Carley, Kathleen M.
Subjects: SOCIAL media, PROPAGANDA, DISINFORMATION, NATURAL language processing, INFECTIOUS disease transmission
Abstract: This paper examines Russia's propaganda discourse on Twitter during the 2022 invasion of Ukraine. The study employs network analysis, natural language processing (NLP) techniques, and qualitative analysis to identify key communities and narratives associated with the prevalent and damaging narrative of "fascism/Nazism" in discussions related to the invasion. The paper implements a methodological pipeline to identify the main topics, and influential actors, as well as to examine the most impactful messages in spreading this disinformation narrative. Overall, this research contributes to the understanding of propaganda dissemination on social media platforms and provides insights into the narratives and communities involved in spreading disinformation during the invasion. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Machine Learning Driven Mental Stress Detection on Reddit Posts Using Natural Language Processing.

Author: Inamdar, Shaunak, Chapekar, Rishikesh, Gite, Shilpa, and Pradhan, Biswajeet
Subjects: PSYCHOLOGICAL stress, ANXIETY, SOCIAL media, MACHINE learning, NATURAL language processing
Abstract: People's mental conditions are often reflected in their social media activity due to the internet's anonymity. Psychiatric issues are often detected through such activities and can be addressed in their early stages, potentially preventing the consequences of unattended mental disorders like depression and anxiety. In this paper, the authors have implemented machine learning models and used various embedding techniques to classify posts from the famous social media blog site Reddit as stressful and non-stressful. The dataset used contains user posts that can be analyzed to detect patterns in the social media activity of those diagnosed with mental disorders. This paper uses different NLP (Natural Language Processing) tools such as ELMo (Embeddings from Language Models) word embeddings, BERT (Bidirectional Encoder Representations from Transformers) tokenizers, and BoW (Bag of Words) approach to create word/sentence data that can be fed to machine learning models. The results of each method have been discussed. The results achieved a top F1 score of 0.76, a Precision score of 0.71, and a Recall of 0.74 using only the preprocessed texts and machine learning algorithms to classify the posts. The results achieved by this paper are significant and have the potential to be applied in real-world scenarios to analyze mental stress among social media users. Although this paper focuses on data from Reddit, the techniques used can be transferred to similar social media platforms and could help solve the growing mental health crisis. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

3. A hybrid dependency-based approach for Urdu sentiment analysis.

Author: Sehar, Urooba, Kanwal, Summrina, Allheeib, Nasser I., Almari, Sultan, Khan, Faiza, Dashtipur, Kia, Gogate, Mandar, and Khashan, Osama A.
Subjects: SENTIMENT analysis, SOCIAL media, NATURAL language processing, ARTIFICIAL neural networks, SOCIAL media in business, DIGITAL technology, AGE groups
Abstract: In the digital age, social media has emerged as a significant platform, generating a vast amount of raw data daily. This data reflects the opinions of individuals from diverse backgrounds, races, cultures, and age groups, spanning a wide range of topics. Businesses can leverage this data to extract valuable insights, improve their services, and effectively reach a broader audience based on users' expressed opinions on social media platforms. To harness the potential of this extensive and unstructured data, a deep understanding of Natural Language Processing (NLP) is crucial. Existing approaches for sentiment analysis (SA) often rely on word co-occurrence frequencies, which prove inefficient in practical scenarios. Identifying this research gap, this paper presents a framework for concept-level sentiment analysis, aiming to enhance the accuracy of sentiment analysis (SA). A comprehensive Urdu language dataset was constructed by collecting data from YouTube, consisting of various talks and reviews on topics such as movies, politics, and commercial products. The dataset was further enriched by incorporating language rules and Deep Neural Networks (DNN) to optimize polarity detection. For sentiment analysis, the proposed framework employs predefined rules to trigger sentiment flow from words to concepts, leveraging the dependency relations among different words in a sentence based on Urdu language grammatical rules. In cases where predefined patterns are not triggered, the framework seamlessly switches to its sub-symbolic counterpart, passing the data to the DNN for sentence classification. Experimental results demonstrate that the proposed framework surpasses state-of-the-art approaches, including LSTM, CNN, SVM, LR, and MLP, achieving an improvement of 6–7% on Urdu dataset. In conclusion, this research paper introduces a novel framework for concept-level sentiment analysis of Urdu language data sourced from social media platforms. By combining language rules and DNN, the proposed framework demonstrates superior performance compared to existing methodologies, showcasing its effectiveness in accurately analyzing sentiment in Urdu text data. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

4. Vaccine sentiment analysis using BERT + NBSVM and geo-spatial approaches.

Author: Umair, Areeba, Masciari, Elio, and Ullah, Muhammad Habib
Subjects: LANGUAGE models, SENTIMENT analysis, GEOLOGICAL statistics, NATURAL language processing, SOCIAL media, MACHINE learning, USER-generated content
Abstract: Since the spread of the coronavirus flu in 2019 (hereafter referred to as COVID-19), millions of people worldwide have been affected by the pandemic, which has significantly impacted our habits in various ways. In order to eradicate the disease, a great help came from unprecedentedly fast vaccines development along with strict preventive measures adoption like lockdown. Thus, world wide provisioning of vaccines was crucial in order to achieve the maximum immunization of population. However, the fast development of vaccines, driven by the urge of limiting the pandemic caused skeptical reactions by a vast amount of population. More specifically, the people's hesitancy in getting vaccinated was an additional obstacle in fighting COVID-19. To ameliorate this scenario, it is important to understand people's sentiments about vaccines in order to take proper actions to better inform the population. As a matter of fact, people continuously update their feelings and sentiments on social media, thus a proper analysis of those opinions is an important challenge for providing proper information to avoid misinformation. More in detail, sentiment analysis (Wankhade et al. in Artif Intell Rev 55(7):5731–5780, 2022. https://doi.org/10.1007/s10462-022-10144-1) is a powerful technique in natural language processing that enables the identification and classification of people feelings (mainly) in text data. It involves the use of machine learning algorithms and other computational techniques to analyze large volumes of text and determine whether they express positive, negative or neutral sentiment. Sentiment analysis is widely used in industries such as marketing, customer service, and healthcare, among others, to gain actionable insights from customer feedback, social media posts, and other forms of unstructured textual data. In this paper, Sentiment Analysis will be used to elaborate on people reaction to COVID-19 vaccines in order to provide useful insights to improve the correct understanding of their correct usage and possible advantages. In this paper, a framework that leverages artificial intelligence (AI) methods is proposed for classifying tweets based on their polarity values. We analyzed Twitter data related to COVID-19 vaccines after the most appropriate pre-processing on them. More specifically, we identified the word-cloud of negative, positive, and neutral words using an artificial intelligence tool to determine the sentiment of tweets. After this pre-processing step, we performed classification using the BERT + NBSVM model to classify people's sentiments about vaccines. The reason for choosing to combine bidirectional encoder representations from transformers (BERT) and Naive Bayes and support vector machine (NBSVM) can be understood by considering the limitation of BERT-based approaches, which only leverage encoder layers, resulting in lower performance on short texts like the ones used in our analysis. Such a limitation can be ameliorated by using Naive Bayes and Support Vector Machine approaches that are able to achieve higher performance in short text sentiment analysis. Thus, we took advantage of both BERT features and NBSVM features to define a flexible framework for our sentiment analysis goal related to vaccine sentiment identification. Moreover, we enrich our results with spatial analysis of the data by using geo-coding, visualization, and spatial correlation analysis to suggest the most suitable vaccination centers to users based on the sentiment analysis outcomes. In principle, we do not need to implement a distributed architecture to run our experiments as the available public data are not massive. However, we discuss a high-performance architecture that will be used if the collected data scales up dramatically. We compared our approach with the state-of-art methods by comparing most widely used metrics like Accuracy, Precision, Recall and F-measure. The proposed BERT + NBSVM outperformed alternative models by achieving 73% accuracy, 71% precision, 88% recall and 73% F-measure for classification of positive sentiments while 73% accuracy, 71% precision, 74% recall and 73% F-measure for classification of negative sentiments respectively. These promising results will be properly discussed in next sections. The use of artificial intelligence methods and social media analysis can lead to a better understanding of people's reactions and opinions about any trending topic. However, in the case of health-related topics like COVID-19 vaccines, proper sentiment identification could be crucial for implementing public health policies. More in detail, the availability of useful findings on user opinions about vaccines can help policymakers design proper strategies and implement ad-hoc vaccination protocols according to people's feelings, in order to provide better public service. To this end, we leveraged geospatial information to support effective recommendations for vaccination centers. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

5. Multidimensional Author Profiling for Social Business Intelligence.

Author: Lanza-Cruz, Indira, Berlanga, Rafael, and Aramburu, María José
Subjects: SOCIAL intelligence, BUSINESS intelligence, LABEL design, NATURAL language processing, SOCIAL networks, MULTIDIMENSIONAL databases
Abstract: This paper presents a novel author profiling method specially aimed at classifying social network users into the multidimensional perspectives for social business intelligence (SBI) applications. In this scenario, being the user profiles defined on demand for each particular SBI application, we cannot assume the existence of labelled datasets for training purposes. Thus, we propose an unsupervised method to obtain the required labelled datasets for training the profile classifiers. Contrary to other author profiling approaches in the literature, we only make use of the users' descriptions, which are usually part of the metadata posts. We exhaustively evaluated the proposed method under four different tasks for multidimensional author profiling along with state-of-the-art text classifiers. We achieved performances around 88% and 98% of F1 score for a gold standard and a silver standard datasets respectively. Additionally, we compare our results to other supervised approaches previously proposed for two of our tasks, getting very close performances despite using an unsupervised method. To the best of our knowledge, this is the first method designed to label user profiles in an unsupervised way for training profile classifiers with a similar performance to fully supervised ones. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. A deep semantic matching approach for identifying relevant messages for social media analysis.

Author: Biggers, Frederick Brown, Mohanty, Somya D., and Manda, Prashanti
Subjects: NATURAL language processing, SOCIAL media, HURRICANE Irma, 2017, NATURAL languages, WORD frequency, USER-generated content, SEMANTICS
Abstract: There is a growing interest in using social media content for Natural Language Processing applications. However, it is not easy to computationally identify the most relevant set of tweets related to any specific event. Challenging semantics coupled with different ways for using natural language in social media make it difficult for retrieving the most relevant set of data from any social media outlet. This paper seeks to demonstrate a way to present the changing semantics of Twitter within the context of a crisis event, specifically tweets during Hurricane Irma. These methods can be used to identify the most relevant corpus of text for analysis in relevance to a specific incident such as a hurricane. Using an implementation of the Word2Vec method of Neural Network training mechanisms to create Word Embeddings, this paper will: discuss how the relative meaning of words changes as events unfold; present a mechanism for scoring tweets based upon dynamic, relative context relatedness; and show that similarity between words is not necessarily static. We present different methods for training the vector model in Word2Vec for identification of the most relevant tweets for any search query. The impact of tuning parameters such as Word Window Size, Minimum Word Frequency, Hidden Layer Dimensionality, and Negative Sampling on model performance was explored. The window containing the local maximum for AU_ROC for each parameter serves as a guide for other studies using the methods presented here for social media data analysis. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

7. Comparative analysis of deep learning based Afaan Oromo hate speech detection.

Author: Ganfure, Gaddisa Olani
Subjects: HATE speech, DEEP learning, NATURAL language processing, SOCIAL media, SPEECH perception, MACHINE learning, COMPARATIVE studies
Abstract: Social media platforms like Facebook, YouTube, and Twitter are banking on developing machine learning models to help stop the spread of hateful speech on their platforms. The idea is that machine learning models that utilize natural language processing will detect hate speech faster and better than people can. Despite numerous progress has been made for resource reach language, only a few attempts have been made for Ethiopian Languages such as Afaan Oromo. This paper examines the viability of deep learning models for Afaan Oromo hate speech recognition. Toward this, the biggest dataset of hate speech was collected and annotated by the language experts. Variations of profound deep learning models such as CNN, LSTMs, BiLSTMs, LSTM, GRU, and CNN-LSTM are examined to evaluate their viability in identifying Afaan Oromo Hate speeches. The result uncovers that the model dependent on CNN and Bi-LSTM outperforms all the other investigated models with an average F1-score of 87%. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

8. A survey on intention analysis: successful approaches and open challenges.

Author: Hamroun, Mohamed and Gouider, Mohamed Salah
Subjects: INTENTION, ATTITUDE (Psychology), NATURAL language processing
Abstract: Intention Analysis is a computational task that analyzes people's desires, wishes, and attitudes from user-generated texts. This sub-field of text mining has recently attracted research interest. This research paper provides an overview and an analysis of the latest studies in this field. These studies were categorized and summarized according to their contributions and the techniques they used. Several proposed approaches and some real applications were investigated in depth and presented in detail. Moreover, some related fields to intention analysis such as Transfer Learning (TL), Spam Detection (SD), and Building Resources (BR) were discussed in this survey of the literature dedicated to Intention Analysis. The aim of this survey is to give a comprehensive view of the intention analysis field supported by a number of graphics and summary tables about the literature. The paper concludes by identifying a number of research topics that can be promising for future research. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

9. New explainability method for BERT-based model in fake news detection.

Author: Szczepański, Mateusz, Pawlicki, Marek, Kozik, Rafał, and Choraś, Michał
Subjects: FAKE news, SOCIAL media, NATURAL language processing, ARTIFICIAL intelligence, DEEP learning, MODERN society
Abstract: The ubiquity of social media and their deep integration in the contemporary society has granted new ways to interact, exchange information, form groups, or earn money—all on a scale never seen before. Those possibilities paired with the widespread popularity contribute to the level of impact that social media display. Unfortunately, the benefits brought by them come at a cost. Social Media can be employed by various entities to spread disinformation—so called 'Fake News', either to make a profit or influence the behaviour of the society. To reduce the impact and spread of Fake News, a diverse array of countermeasures were devised. These include linguistic-based approaches, which often utilise Natural Language Processing (NLP) and Deep Learning (DL). However, as the latest advancements in the Artificial Intelligence (AI) domain show, the model's high performance is no longer enough. The explainability of the system's decision is equally crucial in real-life scenarios. Therefore, the objective of this paper is to present a novel explainability approach in BERT-based fake news detectors. This approach does not require extensive changes to the system and can be attached as an extension for operating detectors. For this purposes, two Explainable Artificial Intelligence (xAI) techniques, Local Interpretable Model-Agnostic Explanations (LIME) and Anchors, will be used and evaluated on fake news data, i.e., short pieces of text forming tweets or headlines. This focus of this paper is on the explainability approach for fake news detectors, as the detectors themselves were part of previous works of the authors. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

10. Review and content analysis of textual expressions as a marker for depressive and anxiety disorders (DAD) detection using machine learning.

Author: Sharma, Chandra Mani, Damani, Darsh, and Chariar, Vijayaraghavan M.
Subjects: MENTAL depression, GENERALIZED anxiety disorder, ANXIETY disorders, MACHINE learning, DIGITAL technology, SOCIAL media, CONTENT analysis
Abstract: Depressive disorders (including major depressive disorder and dysthymia) and anxiety (generalized anxiety disorder or GAD) disorders are the two most prevalent mental illnesses. Early diagnosis of these afflictions can lead to cost-effective treatment with a better outcome prospectus. With the advent of digital technology and platforms, people express themselves by various means, such as social media posts, blogs, journals, instant messaging services, etc. Text remains the most common and convenient form of expression. Therefore, it can be used to predict the onset of anxiety and depression. Scopus and Web of Science (WoS) databases were used to retrieve the relevant literature using a set of predefined search strings. Irrelevant publications were filtered using multiple criteria. The research meta data was subsequently analyzed using the Biblioshiny Tool of R. Finally, a comparative analysis of most suitable documents is presented. A total of 103 documents were used for bibliometric mapping in terms of research outcome over the past years, productivity of authors, institutions, and countries, collaborations, trend topics, keyword co-occurrence, etc. Neural networks and support vector machines are the most popular ML techniques; word embeddings are extensively used for text representations. There is a shift toward using multiple modalities. SVM, Naive Bayes, and LSTM are the most used ML methods; social media is the most used source of data (Twitter is the most common platform); and audio is the most used modality that is combined with text for depressive and anxiety disorders (DAD) detection. Text data provides good cues for the detection of DAD using machine learning. However, the findings in most of the cases are based on a limited amount of data. Using large amounts of data with other modalities can help develop more generalized DAD-detection systems. Asian countries are leading in the research output with China and India being the top countries in terms of the number of research publications. However, more international collaborations are needed. Limited research exists for anxiety disorders. Co-occurrence of anxiety and depressive disorders is high (33% of studies). [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

11. Social media text analytics of Malayalam–English code-mixed using deep learning.

Author: Thara, S. and Poornachandran, Prabaharan
Subjects: SOCIAL media, DEEP learning, HYACINTHOIDES, NATURAL language processing, SENTIMENT analysis, TEXT messages, COMPUTATIONAL linguistics, MACHINE learning
Abstract: Zigzag conversational patterns of contents in social media are often perceived as noisy or informal text. Unrestricted usage of vocabulary in social media communications complicates the processing of code-mixed text. This paper accentuates two major aspects of code mixed text: Offensive Language Identification and Sentiment Analysis for Malayalam–English code-mixed data set. The proffered framework addresses 3 key points apropos these tasks—dependencies among features created by embedding methods (Word2Vec and FastText), comparative analysis of deep learning algorithms (uni-/bi-directional models, hybrid models, and transformer approaches), relevance of selective translation and transliteration and hyper-parameter optimization—which ensued in F1-Scores (model's accuracy) of 0.76 for Forum for Information Retrieval Evaluation (FIRE) 2020 and 0.99 for European Chapter of the Association for Computational Linguistics (EACL) 2021 data sets. A detailed error analysis was also done to give meaningful insights. The submitted strategy turned in the best results among the benchmarked models dealing with Malayalam–English code-mixed messages and it serves as an important step towards societal good. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

12. Epidemic zone of COVID-19 from social media using hypergraph with weighting factor (HWF).

Author: Pradeepa, S. and Manjula, K. R.
Subjects: COVID-19 pandemic, ONLINE social networks, INFECTIOUS disease transmission, SARS-CoV-2, COVID-19, USER-generated content, SOCIAL media
Abstract: Online social network is one of the most prominent media that holds information about society's epidemic problem. Due to privacy reasons, most of the users will not disclose their location. Detecting the location of the tweet users is required to track the geographic location of the spreading diseases. This work aims to detect the spreading location of the COVID-19 disease from the Twitter users and content discussed in the tweet. COVID-19 is a disease caused by the "novel coronavirus." About 80% of confirmed cases recover from the disease. However, one out of every six people who get COVID-19 can become seriously ill, stated by the World health organization. Inferring the user location for identifying the spreading location for the disease is a very challenging task. This paper proposes a new technique based on a hypergraph model to detect the Twitter user's locations based on the spreading disease. This model uses hypergraph with weighting factor technique to infer the spreading disease's spatial location. The accuracy of prediction can be improved when a massive volume of streaming data is analyzed. The Helly property of the hypergraph was applied to discard less potential words from the text analysis, which claims this work of unique nature. A weighting factor was introduced to calculate the score of each location for a particular user. The location of each user is predicted based on the one that possesses the highest weighting factor. The proposed framework has been evaluated and tested for various measures like precision, recall and F-measure. The promising results obtained have substantiated the claim for this work compared to the state-of-the-art methodologies. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

13. Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media.

Author: Albalawi, Yahya, Buckley, Jim, and Nikolov, Nikola S.
Subjects: MEDICAL communication, DEEP learning, SOCIAL media, NATURAL language processing, MACHINE learning
Abstract: This paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

14. Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging.

Author: Christian, Hans, Suhartono, Derwin, Chowanda, Andry, and Zamli, Kamal Z.
Subjects: DEEP learning, FEATURE extraction, FIVE-factor model of personality, SOCIAL media, HYACINTHOIDES, PERSONALITY assessment, PERSONALITY tests, PERSONALITY
Abstract: The ever-increasing social media users has dramatically contributed to significant growth as far as the volume of online information is concerned. Often, the contents that these users put in social media can give valuable insights on their personalities (e.g., in terms of predicting job satisfaction, specific preferences, as well as the success of professional and romantic relationship) and getting it without the hassle of taking formal personality test. Termed personality prediction, the process involves extracting the digital content into features and mapping it according to a personality model. Owing to its simplicity and proven capability, a well-known personality model, called the big five personality traits, has often been adopted in the literature as the de facto standard for personality assessment. To date, there are many algorithms that can be used to extract embedded contextualized word from textual data for personality prediction system; some of them are based on ensembled model and deep learning. Although useful, existing algorithms such as RNN and LSTM suffers from the following limitations. Firstly, these algorithms take a long time to train the model owing to its sequential inputs. Secondly, these algorithms also lack the ability to capture the true (semantic) meaning of words; therefore, the context is slightly lost. To address these aforementioned limitations, this paper introduces a new prediction using multi model deep learning architecture combined with multiple pre-trained language model such as BERT, RoBERTa, and XLNet as features extraction method on social media data sources. Finally, the system takes the decision based on model averaging to make prediction. Unlike earlier work which adopts a single social media data with open and close vocabulary extraction method, the proposed work uses multiple social media data sources namely Facebook and Twitter and produce a predictive model for each trait using bidirectional context feature combine with extraction method. Our experience with the proposed work has been encouraging as it has outperformed similar existing works in the literature. More precisely, our results achieve a maximum accuracy of 86.2% and 0.912 f1 measure score on the Facebook dataset; 88.5% accuracy and 0.882 f1 measure score on the Twitter dataset. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

15. Annotating and detecting topics in social media forum and modelling the annotation to derive directions-a case study.

Author: Athira, B., Jones, Josette, Idicula, Sumam Mary, Kulanthaivel, Anand, and Zhang, Enming
Subjects: PATIENTS' attitudes, INTERNET forums, SOCIAL media, DEEP learning, MEDICAL personnel, MACHINE learning, SUPERVISED learning
Abstract: The widespread influence of social media impacts every aspect of life, including the healthcare sector. Although medics and health professionals are the final decision makers, the advice and recommendations obtained from fellow patients are significant. In this context, the present paper explores the topics of discussion posted by breast cancer patients and survivors on online forums. The study examines an online forum, Breastcancer.org, maps the discussion entries to several topics, and proposes a machine learning model based on a classification algorithm to characterize the topics. To explore the topics of breast cancer patients and survivors, approximately 1000 posts are selected and manually labeled with annotations. In contrast, millions of posts are available to build the labels. A semi-supervised learning technique is used to build the labels for the unlabeled data; hence, the large data are classified using a deep learning algorithm. The deep learning algorithm BiLSTM with BERT word embedding technique provided a better f1-score of 79.5%. This method is able to classify the following topics: medication reviews, clinician knowledge, various treatment options, seeking and providing support, diagnostic procedures, financial issues and implications for everyday life. What matters the most for the patients is coping with everyday living as well as seeking and providing emotional and informational support. The approach and findings show the potential of studying social media to provide insight into patients' experiences with cancer like critical health problems. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

16. Building socially-enabled event-enriched maps.

Author: Rehman, Faizan Ur, Afyouni, Imad, Lbath, Ahmed, Khan, Sohaib, and Basalamah, Saleh
Subjects: DIGITAL maps, NATURAL language processing, DATA packeting, TRAFFIC congestion, STREAMING technology, SOCIAL media
Abstract: With the advancement of social sensing technologies, digital maps have recently witnessed a tremendous evolution with the aim of integrating enriched semantic layers from heterogeneous and diverse data sources. Current generations of digital maps are often crowd-sourced, allow interactive route planning, and may contain live updates, such as traffic congestion states. Within this context, we believe that the next generation of maps will introduce the concept of extracting Events of Interest (EoI) from crowdsourced data, and displaying them at different spatial scales based on their significance. This paper introduces Hadath1, a scalable and efficient system that extracts social events from unstructured data streams, e.g. Twitter. Hadath applies natural language processing and multi-dimensional clustering techniques to extract relevant events of interest at different map scales, and to infer the spatio-temporal scope of detected events. Hadath also implements a hierarchical in-memory spatio-temporal indexing scheme to allow efficient and scalable access to raw data, as well as to extracted clusters of events. Initially, data packets are processed to discover events at a local scale, then, the proper spatio-temporal scope and the significance of detected events at a global scale is determined. As a result, live events can be displayed at different spatio-temporal resolutions, thus allowing a smooth and unique browsing experience. Finally, to validate our proposed system, we conducted experiments on real-time and historical social media streams. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

17. Profile update: the effects of identity disclosure on network connections and language.

Author: Choi, Minje, Romero, Daniel M., and Jurgens, David
Subjects: SOCIAL media, DISCLOSURE, NATURAL language processing, MICROBLOGS, ONLINE identities, SOCIAL networks
Abstract: Our social identities determine how we interact and engage with the world surrounding us. In online settings, individuals can make these identities explicit by including them in their public biography, possibly signaling a change in what is important to them and how they should be viewed. While there is evidence suggesting the impact of intentional identity disclosure in online social platforms, its actual effect on engagement activities at the user level has yet to be explored. Here, we perform the first large-scale study on Twitter that examines behavioral changes following identity disclosure on Twitter profiles. Combining social networks with methods from natural language processing and quasi-experimental analyses, we discover that after disclosing an identity on their profiles, users (1) tweet and retweet more in a way that aligns with their respective identities, and (2) connect more with users that disclose similar identities. We also examine whether disclosing the identity increases the chance of being targeted for offensive comments and find that in fact (3) the combined effect of disclosing identity via both tweets and profiles is associated with a reduced number of offensive replies from others. Our findings highlight that the decision to disclose one's identity in online spaces can lead to substantial changes in how they express themselves or forge connections, with a lesser degree of negative consequences than anticipated. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Automatic discovery of adverse reactions through Chinese social media.

Author: Zhang, Mengxue, Zhang, Meizhuo, Ge, Chen, Liu, Quanyang, Wang, Jiemin, Wei, Jia, and Zhu, Kenny Q.
Subjects: SOCIAL media, INTERNET forums, DRUG side effects, HIDDEN Markov models, RANDOM fields, PHYSICIANS, NATURAL language processing
Abstract: Despite tremendous efforts made before the release of every drug, some adverse drug reactions (ADRs) may go undetected and thus, cause harm to both the users and to the pharmaceutical companies. One plausible venue to collect evidence of such ADRs is online social media, where patients and doctors discuss medical conditions and their treatments. There is substantial previous research on ADRs extraction from English online forums. However, very limited research was done on Chinese data. In this paper, we try to use the posts from two popular Chinese social media as the original dataset. We propose a semi-supervised learning framework that detects mentions of medications and colloquial ADR terms and extracts lexicon-syntactic features from natural language text to recognize positive associations between drug use and ADRs. The key contribution is an automatic label generation algorithm, which requires very little manual annotation. This bootstrapping algorithm could also be further applied on English data. The research results indicate that our algorithm outperforms the hidden Markov model and conditional random fields. With this approach, we discovered a large number of side effects for a variety of popular medicines in real world scenarios. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

19. Contemporary attitudes and beliefs on coronary artery calcium from social media using artificial intelligence.

Author: Somani, Sulaiman, Balla, Sujana, Peng, Allison W., Dudum, Ramzi, Jain, Sneha, Nasir, Khurram, Maron, David J., Hernandez-Boussard, Tina, and Rodriguez, Fatima
Subjects: ATHEROSCLEROSIS risk factors, SOCIAL media, QUALITATIVE research, COMPUTER software, CLUSTER analysis (Statistics), RESEARCH funding, ARTIFICIAL intelligence, RADIATION injuries, PUBLIC opinion, DECISION making in clinical medicine, NATURAL language processing, DECISION making, MISINFORMATION, COMMUNICATION, CORONARY artery calcification, MACHINE learning, SENTIMENT analysis, EVIDENCE-based medicine, ALGORITHMS, DISEASE risk factors
Abstract: Coronary artery calcium (CAC) is a powerful tool to refine atherosclerotic cardiovascular disease (ASCVD) risk assessment. Despite its growing interest, contemporary public attitudes around CAC are not well-described in literature and have important implications for shared decision-making around cardiovascular prevention. We used an artificial intelligence (AI) pipeline consisting of a semi-supervised natural language processing model and unsupervised machine learning techniques to analyze 5,606 CAC-related discussions on Reddit. A total of 91 discussion topics were identified and were classified into 14 overarching thematic groups. These included the strong impact of CAC on therapeutic decision-making, ongoing non-evidence-based use of CAC testing, and the patient perceived downsides of CAC testing (e.g., radiation risk). Sentiment analysis also revealed that most discussions had a neutral (49.5%) or negative (48.4%) sentiment. The results of this study demonstrate the potential of an AI-based approach to analyze large, publicly available social media data to generate insights into public perceptions about CAC, which may help guide strategies to improve shared decision-making around ASCVD management and public health interventions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. A sequential neural recommendation system exploiting BERT and LSTM on social media posts.

Author: Noorian, A., Harounabadi, A., and Hazratifard, M.
Subjects: RECOMMENDER systems, SOCIAL media, NEUROPROSTHESES, SENTIMENT analysis, NATURAL language processing
Abstract: Tourists share opinions about Points of Interest (POIs) through online posts and social media platforms. Opinion mining is a popular technique for extracting feedback from tourists who visited various places hidden in reviews, which are used in several tourist applications that generally reflect their preference towards POI. On the other hand, a trip schema is difficult for tourists because they must pick up sequential POIs in unknown areas that meet their limitations and preferences. However, most prior trip suggestion methods are suboptimal for several reasons, including that they do not consider valuable user reviews and rely exclusively on left-to-right unidirectional discovery sequence models. This study proposes a Neural Network-Long Short-Term Memory (LSTM) POI recommendation system for calculating user similarity based on opinions and preferences. In addition, it presents a method for discovering sequential trip recommendations with Bidirectional Encoder Representations from Transformer (BERT) using a deep learning method. Furthermore, this neural hybrid framework identifies a list of optimal trip candidates by combining personalized POIs with multifaceted context. Furthermore, this method employs the valuable information contained in user posts and their demographic information on social media to mitigate the well-known cold start issue. In the experimental evaluation based on two datasets, Tripadvisor and Yelp, this hybrid method outperforms other state-of-the-art methods when considering F-Score, nDCG, RMSE, and MAP. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. Social media analysis reveals environmental injustices in Philadelphia urban parks.

Author: Walter, Matthew, Bagozzi, Benjamin E., Ajibade, Idowu, and Mondal, Pinki
Subjects: URBAN parks, NATURAL language processing, RESIDENTIAL segregation, PUBLIC spaces, SOCIAL media, CITY dwellers, REMOTE-sensing images
Abstract: The United Nations Sustainable Development Goal (SDG) target 11.7 calls for access to safe and inclusive green spaces for all communities. Yet, historical residential segregation in the USA has resulted in poor quality urban parks near neighborhoods with primarily disadvantaged socioeconomic status groups, and an extensive park system that addresses the needs of primarily White middle-class residents. Here we center the voices of historically marginalized urban residents by using Natural Language Processing and Geographic Information Science to analyze a large dataset (n = 143,913) of Google Map reviews from 2011 to 2022 across 285 parks in the City of Philadelphia, USA. We find that parks in neighborhoods with a high number of residents from historically disadvantaged demographic groups are likely to receive lower scores on Google Maps. Physical characteristics of these parks based on aerial and satellite images and ancillary data corroborate the public perception of park quality. Topic modeling of park reviews reveal that the diverse environmental justice needs of historically marginalized communities must be met to reduce the uneven park quality—a goal in line with achieving SDG 11 by 2030. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

22. Geographical visualization of tweets, misinformation, and extremism during the USA 2020 presidential election using LSTM, NLP, and GIS.

Author: Hashemi, Mahdi
Subjects: UNITED States presidential election, 2020, DEEP learning, GEOGRAPHIC information systems, ONLINE social networks, NATURAL language processing, POLITICAL affiliation, MISINFORMATION
Abstract: Disinformation campaigns on online social networks (OSN) in recent years, have underscored democracies' vulnerability to such operations and the importance of identifying such operations and dissecting their methods, intents, and source. With a focus on the USA 2020 presidential election, a total of 1,349,373 original Tweets have been collected by our server in real-time from the beginning of April 2020 to the end of January 2021, using four keywords: Trump, Biden, Democrats, and Republicans. In this work, deep learning, natural language processing, geographical information systems, and statistical tools are used to geographically visualize and discover if the political misinformation and extremism, political affiliation, and topics of conversations on social media are correlated with the USA 2020 presidential election results. To this end, a deep neural network is trained using 40,000 manually classified Tweets and further used to automatically classify the entire set of Tweets based on their political affiliation, topic, and whether or not they contain misinformation or extremism. It is shown that, there is a correlation between the aforementioned classes of Tweets and the election results. In other words, the political affiliation of topics and the extent of misinformation and extremism on social media are correlated with the election results to some level. The strongest correlation highlighted that the ratio of Rightist versus Leftist misinformation Tweets has a 0.67 correlation coefficient with the ratio of Trump votes versus Biden votes, across different states. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

23. An improved sentiment classification model based on data quality and word embeddings.

Author: Siagh, Asma, Laallam, Fatima Zohra, Kazar, Okba, and Salem, Hajer
Subjects: SOCIAL media, SOCIAL media in education, USER-generated content, DATA quality, SENTIMENT analysis, BOOSTING algorithms, NATURAL language processing
Abstract: User-generated content on social media platforms has reached big data levels. Sentiment analysis of this data provides opportunities to gain valuable insights into any domain. However, analyzing real-world data may confront the challenge of class imbalance, which can adversely affect the generalization ability of models due to majority class overfitting. Therefore, having an efficient model that manages any scenario of imbalanced data is practically needed. In this light, this work proposes different models based on studying the impact of data quality and transfer learning through pre-trained embeddings on boosting minority class detection. The proposed models are tested on imbalanced datasets related to social media and education. The experimental results highlight the effectiveness of Wor2vec, Glove, and Fasttext embeddings with preprocessed data. In contrast, BERT embeddings present better results with no-preprocessed data. Furthermore, in comparison with other methods, the best-performing model resulting from this study shows outperformance with notable improvements. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

24. An exploratory content and sentiment analysis of the guardian metaverse articles using leximancer and natural language processing.

Author: Tunca, Sezai, Sezen, Bulent, and Wilk, Violetta
Subjects: SHARED virtual environments, SENTIMENT analysis, SOCIAL media, USER-generated content, NATURAL language processing, CONTENT analysis, ARTIFICIAL intelligence
Abstract: The metaverse has become one of the most popular concepts of recent times. Companies and entrepreneurs are fiercely competing to invest and take part in this virtual world. Millions of people globally are anticipated to spend much of their time in the metaverse, regardless of their age, gender, ethnicity, or culture. There are few comprehensive studies on the positive/negative sentiment and effect of the newly identified, but not well defined, metaverse concept that is already fast evolving the digital landscape. Thereby, this study aimed to better understand the metaverse concept, by, firstly, identifying the positive and negative sentiment characteristics and, secondly, by revealing the associations between the metaverse concept and other related concepts. To do so, this study used Natural Language Processing (NLP) methods, specifically Artificial Intelligence (AI) with computational qualitative analysis. The data comprised metaverse articles from 2021 to 2022 published on The Guardian website, a key global mainstream media outlet. To perform thematic content analysis of the qualitative data, this research used the Leximancer software, and the The Natural Language Toolkit (NLTK) from NLP libraries were used to identify sentiment. Further, an AI-based Monkeylearn API was used to make sectoral classifications of the main topics that emerged in the Leximancer analysis. The key themes which emerged in the Leximancer analysis, included "metaverse", "Facebook", "games" and "platforms". The sentiment analysis revealed that of all articles published in the period of 2021–2022 about the metaverse, 61% (n = 622) were positive, 30% (n = 311) were negative, and 9% (n = 90) were neutral. Positive discourses about the metaverse were found to concern key innovations that the virtual experiences brought to users and companies with the support of the technological infrastructure of blockchain, algorithms, NFTs, led by the gaming world. Negative discourse was found to evidence various problems (misinformation, harmful content, algorithms, data, and equipment) that occur during the use of Facebook and other social media platforms, and that individuals encountered harm in the metaverse or that the metaverse produces new problems. Monkeylearn findings revealed "marketing/advertising/PR" role, "Recreational" business, "Science & Technology" events as the key content topics. This study's contribution is twofold: first, it showcases a novel way to triangulate qualitative data analysis of large unstructured textual data as a method in exploring the metaverse concept; and second, the study reveals the characteristics of the metaverse as a concept, as well as its association with other related concepts. Given that the topic of the metaverse is new, this is the first study, to our knowledge, to do both. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

25. Twitter sentiment analysis using hybrid gated attention recurrent network.

Author: Parveen, Nikhat, Chakrabarti, Prasun, Hung, Bui Thanh, and Shaik, Amjan
Subjects: SENTIMENT analysis, RECURRENT neural networks, WHITE shark, NATURAL language processing, FEATURE selection, SOCIAL media, DEEP learning
Abstract: Sentiment analysis is the most trending and ongoing research in the field of data mining. Nowadays, several social media platforms are developed, among that twitter is a significant tool for sharing and acquiring peoples' opinions, emotions, views, and attitudes towards particular entities. This made sentiment analysis a fascinating process in the natural language processing (NLP) domain. Different techniques are developed for sentiment analysis, whereas there still exists a space for further enhancement in accuracy and system efficacy. An efficient and effective optimization based feature selection and deep learning based sentiment analysis is developed in the proposed architecture to fulfil it. In this work, the sentiment 140 dataset is used for analysing the performance of proposed gated attention recurrent network (GARN) architecture. Initially, the available dataset is pre-processed to clean and filter out the dataset. Then, a term weight-based feature extraction termed Log Term Frequency-based Modified Inverse Class Frequency (LTF-MICF) model is used to extract the sentiment-based features from the pre-processed data. In the third phase, a hybrid mutation-based white shark optimizer (HMWSO) is introduced for feature selection. Using the selected features, the sentiment classes, such as positive, negative, and neutral, are classified using the GARN architecture, which combines recurrent neural networks (RNN) and attention mechanisms. Finally, the performance analysis between the proposed and existing classifiers is performed. The evaluated performance metrics and the gained value for such metrics using the proposed GARN are accuracy 97.86%, precision 96.65%, recall 96.76% and f-measure 96.70%, respectively. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

26. Scaling up the discovery of hesitancy profiles by identifying the framing of beliefs towards vaccine confidence in Twitter discourse.

Author: Weinzierl, Maxwell A., Hopfer, Suellen, and Harabagiu, Sanda M.
Subjects: VACCINATION, CONFIDENCE, ETHICS, COVID-19 vaccines, ATTITUDE (Psychology), SOCIAL media, NATURAL language processing, HEALTH literacy, VACCINE hesitancy, HEALTH attitudes, HUMAN papillomavirus vaccines, DRUG labeling, MISINFORMATION, PHARMACEUTICAL industry, COVID-19 pandemic, TRUST
Abstract: Our study focused on the discovery of how vaccine hesitancy is framed in Twitter discourse, allowing us to recognize at-scale all tweets that evoke any of the hesitancy framings as well as the stance of the tweet authors towards the frame. By categorizing the hesitancy framings that propagate misinformation, address issues of trust in vaccines, or highlight moral issues or civil rights, we were able to empirically recognize their ontological commitments. Ontological commitments of vaccine hesitancy framings couples with the stance of tweet authors allowed us to identify hesitancy profiles for two most controversial yet effective and underutilized vaccines for which there remains substantial reluctance among the public: the Human Papillomavirus and the COVID-19 vaccines. The discovered hesitancy profiles inform public health messaging approaches to effectively reach Twitter users with promise to shift or bolster vaccine attitudes. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

27. Sentiment analysis in tweets: an assessment study from classical to modern word representation models.

Author: Barreto, Sérgio, Moura, Ricardo, Carvalho, Jonnathan, Paes, Aline, and Plastino, Alexandre
Subjects: SENTIMENT analysis, USER-generated content, NATURAL language processing, SOCIAL media, INFORMATION resources, SOCIAL networks, NAIVE Bayes classification, CLASSIFICATION algorithms
Abstract: With the exponential growth of social media networks, such as Twitter, plenty of user-generated data emerge daily. The short texts published on Twitter – the tweets – have earned significant attention as a rich source of information to guide many decision-making processes. However, their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks, including sentiment analysis. Sentiment classification is tackled mainly by machine learning-based classifiers. The literature has adopted different types of word representation models to transform tweets to vector-based inputs to feed sentiment classifiers. The representations come from simple count-based methods, such as bag-of-words, to more sophisticated ones, such as BERTweet, built upon the trendy BERT architecture. Nevertheless, most studies mainly focus on evaluating those models using only a small number of datasets. Despite the progress made in recent years in language modeling, there is still a gap regarding a robust evaluation of induced embeddings applied to sentiment analysis on tweets. Furthermore, while fine-tuning the model from downstream tasks is prominent nowadays, less attention has been given to adjustments based on the specific linguistic style of the data. In this context, this study fulfills an assessment of existing neural language models in distinguishing the sentiment expressed in tweets, by using a rich collection of 22 datasets from distinct domains and five classification algorithms. The evaluation includes static and contextualized representations. Contexts are assembled from Transformer-based autoencoder models that are also adapted based on the masked language model task, using a plethora of strategies. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

28. A Bilingual Comparison of Sentiment and Topics for a Product Event on Twitter.

Author: Wedel, Irina, Palk, Michael, and Voß, Stefan
Subjects: NATURAL language processing, SOCIAL media in business, SOCIAL media, SENTIMENT analysis, USER-generated content, MICROBLOGS, NEW product development, GERMAN language
Abstract: Social media enable companies to assess consumers' opinions, complaints and needs. The systematic and data-driven analysis of social media to generate business value is summarized under the term Social Media Analytics which includes statistical, network-based and language-based approaches. We focus on textual data and investigate which conversation topics arise during the time of a new product introduction on Twitter and how the overall sentiment is during and after the event. The analysis via Natural Language Processing tools is conducted in two languages and four different countries, such that cultural differences in the tonality and customer needs can be identified for the product. Different methods of sentiment analysis and topic modeling are compared to identify the usability in social media and in the respective languages English and German. Furthermore, we illustrate the importance of preprocessing steps when applying these methods and identify relevant product insights. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

29. News sharing on Twitter reveals emergent fragmentation of media agenda and persistent polarization.

Author: Cicchini, Tomas, del Pozo, Sofia Morena, Tagliazucchi, Enzo, and Balenzuela, Pablo
Subjects: POLARIZATION (Social sciences), BIPARTITE graphs, AFFINITY groups, COMMUNITIES, PUBLIC opinion
Abstract: News sharing on social networks reveals how information disseminates among users. This process, constrained by user preferences and social ties, plays a key role in the formation of public opinion. In this work, we used bipartite news-user networks to study the news sharing behavior of main Argentinian media outlets in Twitter. Our objective was to understand the role of political polarization in the emergence of high affinity groups with respect to news sharing. We compared results between years with and without presidential elections, and between groups of politically active and inactive users, the latter serving as a control group. The behavior of users resulted in well-differentiated communities of news articles identified by a unique distribution of media outlets. In particular, the structure of these communities revealed the dominant ideological polarization in Argentina. We also found that users formed two groups identified by their consumption of media outlets, which also displayed a bias towards the two main parties that dominate the political life in Argentina. Overall, our results consistently identified ideological polarization as a main driving force underlying Argentinian news sharing behavior in Twitter. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

30. Analysis of the causes of inferiority feelings based on social media data with Word2Vec.

Author: Liu, Yu, Xu, Chen, Kuai, Xi, Deng, Hao, Wang, Kaifeng, and Luo, Qinyao
Subjects: SOCIAL media, NATURAL language processing, HELP-seeking behavior, EMOTIONS
Abstract: Feelings of inferiority are complex emotions that usually indicate perceived weakness and helplessness. A lack of timely and effective interventions may bring serious consequences to individuals with inferiority feelings. Due to privacy concerns, those people often hesitate to seek face-to-face help, but they usually spontaneously share their feelings on social media, which makes social media a good resource for ample inferiority-related data. We randomly selected a sample of posts indicating inferiority feelings to explore the causes of inferiority. Through language analysis and natural language processing, we constructed a Word2Vec model of inferiority based on social media data and applied it to the cause analysis of inferiority feelings. The main causes of inferiority feelings are personal experience, social interaction, love relationship, etc. People feeling inferior about their personal experiences usually are largely influenced by their ways of thinking and life attitudes. Social and emotional factors overlap somewhat in the development of inferiority. In love relationships, males are more prone to inferiority feeling than females. These findings will help relevant institutions and organizations better understand people with inferiority feelings and facilitate the development of targeted treatment for those with potential self-esteem problems. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

31. A multi-label ensemble predicting model to service recommendation from social media contents.

Author: Jain, Praphula Kumar, Pamula, Rajendra, and Yekun, Ephrem Admasu
Subjects: SUPPORT vector machines, K-nearest neighbor classification, NAIVE Bayes classification, USER-generated content, RANDOM forest algorithms, SOCIAL media
Abstract: Consumer sentiment is one of the essential measures of predictive recommendations in travel and tourism. Nowadays, a massive amount of data is available on the online platform related to consumer sentiment, which may help draw insights into how consumers provide feedback and how we can use that feedback to predict recommendations using machine learning techniques. In this study, we have designed a predictive recommendations method that predicts the consumer recommendations in travel and tourism, particularly in the case of the airline. We developed our predictive methods as a multi-label classification system. We implemented K-Nearest Neighbors, Support Vector Machine, Multi-layer Perceptron, Logistic Regression, Random Forest, and Ensemble Learning as basic classification models to train our model. Further, we boosted our predictive model by implementing state-of-the-art partitioning methods to partition the label space in lower spaces and utilized label space partitioning approaches, namely RAkELo and Louvain, as a transformation technique to transform every label set into a multi-class classification problem. The suggested model obtained higher performance in terms of accuracy using various evaluation measures compared to other binary classifications. Furthermore, the valuable traveler may get benefited from this approach in making their predictive decisions before travel. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

32. Sarcastic user behavior classification and prediction from social media data using firebug swarm optimization-based long short-term memory.

Author: Karthik, E. and Sethukarasi, T.
Subjects: SOCIAL prediction, SOCIAL media, FACIAL expression, CLASSIFICATION, HISTORY of accounting
Abstract: Sarcasm is a type of speech where people use positive words to convey a negative message. Recently, only a few research have been presented that focus on the entire spectrum of sarcasm in order to identify sarcastic sentiments present in both the image and the text. This work presents a novel firebug swarm optimization-based long short-term memory (FSO-LSTM) architecture to identify the sarcastic sentiments present in tweets. To identify the facial expressions of the users, the proposed FSO-based LSTM architecture is trained using the CK + dataset. The FSO algorithm is used to optimize the weighting factors of the LSTM architecture and also minimize the root-mean-square error (RMSE) and mean absolute error. The proposed method primarily attempts to address two challenging issues in sarcasm detection: the high number of false negatives and the fact that polite tweets often go undetected. The user's mood changes (sarcastic) such as rude, polite, furious, and impassive can be identified using the proposed model. Hence, the proposed classifier is capable of analyzing the behavior change of the user by collecting the past twitter account history. The efficiency of the proposed methodology is evaluated using different performance metrics such as accuracy, RMSE, confusion matrix, and loss. The proposed methodology offers an average classification accuracy of 97.25% classification accuracy when compared to the state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

33. Grass-roots entrepreneurship complements traditional top-down innovation in lung and breast cancer.

Author: Ramadi, Khalil B., Mehta, Rhea, He, David, Chao, Sichen, Chu, Zen, Atun, Rifat, and Nguyen, Freddy T.
Subjects: LUNG cancer, BREAST cancer, NATURAL language processing, MICROBLOGS, MEDICAL research, PUBLIC opinion, SEARCH engines, INVESTMENTS, ENTREPRENEURSHIP, SOCIAL media, MORTALITY, LUNG tumors, DISEASE incidence, ENDOWMENT of research, SOCIOECONOMIC factors, MAPS, DIFFUSION of innovations, BREAST tumors
Abstract: The majority of biomedical research is funded by public, governmental, and philanthropic grants. These initiatives often shape the avenues and scope of research across disease areas. However, the prioritization of disease-specific funding is not always reflective of the health and social burden of each disease. We identify a prioritization disparity between lung and breast cancers, whereby lung cancer contributes to a substantially higher socioeconomic cost on society yet receives significantly less funding than breast cancer. Using search engine results and natural language processing (NLP) of Twitter tweets, we show that this disparity correlates with enhanced public awareness and positive sentiment for breast cancer. Interestingly, disease-specific venture activity does not correlate with funding or public opinion. We use outcomes from recent early-stage innovation events focused on lung cancer to highlight the complementary mechanism by which bottom-up "grass-roots" initiatives can identify and tackle under-prioritized conditions. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

34. Toward multi-label sentiment analysis: a transfer learning based approach.

Author: Tao, Jie and Fang, Xing
Subjects: SENTIMENT analysis, NATURAL language processing, SOCIAL media, LABELS, MODEL railroads, AUTOMATED guided vehicle systems
Abstract: Sentiment analysis is recognized as one of the most important sub-areas in Natural Language Processing (NLP) research, where understanding implicit or explicit sentiments expressed in social media contents is valuable to customers, business owners, and other stakeholders. Researchers have recognized that the generic sentiments extracted from the textual contents are inadequate, thus, Aspect Based Sentiment Analysis (ABSA) was coined to capture aspect sentiments expressed toward specific review aspects. Existing ABSA methods not only treat the analytical problem as single-label classification that requires a fairly large amount of labelled data for model training purposes, but also underestimate the entity aspects that are independent of certain sentiments. In this study, we propose a transfer learning based approach tackling the aforementioned shortcomings of existing ABSA methods. Firstly, the proposed approach extends the ABSA methods with multi-label classification capabilities. Secondly, we propose an advanced sentiment analysis method, namely Aspect Enhanced Sentiment Analysis (AESA) to classify text into sentiment classes with consideration of the entity aspects. Thirdly, we extend two state-of-the-art transfer learning models as the analytical vehicles of multi-label ABSA and AESA tasks. We design an experiment that includes data from different domains to extensively evaluate the proposed approach. The empirical results undoubtedly exhibit that the proposed approach outperform all the baseline approaches. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

35. Epidemiology from Tweets: Estimating Misuse of Prescription Opioids in the USA from Social Media.

Author: Chary, Michael, Genes, Nicholas, Giraud-Carrier, Christophe, Hanson, Carl, Nelson, Lewis, and Manini, Alex
Subjects: SOCIAL media, EPIDEMIOLOGY, OPIOID abuse, DRUG abuse risk factors, DRUG abuse treatment, COMPUTATIONAL linguistics, MEDICAL communication, ELECTRONIC health records
Abstract: Background: The misuse of prescription opioids (MUPO) is a leading public health concern. Social media are playing an expanded role in public health research, but there are few methods for estimating established epidemiological metrics from social media. The purpose of this study was to demonstrate that the geographic variation of social media posts mentioning prescription opioid misuse strongly correlates with government estimates of MUPO in the last month. Methods: We wrote software to acquire publicly available tweets from Twitter from 2012 to 2014 that contained at least one keyword related to prescription opioid use (n = 3,611,528). A medical toxicologist and emergency physician curated the list of keywords. We used the semantic distance (SemD) to automatically quantify the similarity of meaning between tweets and identify tweets that mentioned MUPO. We defined the SemD between two words as the shortest distance between the two corresponding word-centroids. Each word-centroid represented all recognized meanings of a word. We validated this automatic identification with manual curation. We used Twitter metadata to estimate the location of each tweet. We compared our estimated geographic distribution with the 2013-2015 National Surveys on Drug Usage and Health (NSDUH). Results: Tweets that mentioned MUPO formed a distinct cluster far away from semantically unrelated tweets. The state-by-state correlation between Twitter and NSDUH was highly significant across all NSDUH survey years. The correlation was strongest between Twitter and NSDUH data from those aged 18-25 (r = 0.94, p < 0.01 for 2012; r = 0.94, p < 0.01 for 2013; r = 0.71, p = 0.02 for 2014). The correlation was driven by discussions of opioid use, even after controlling for geographic variation in Twitter usage. Conclusions: Mentions of MUPO on Twitter correlate strongly with state-by-state NSDUH estimates of MUPO. We have also demonstrated that a natural language processing can be used to analyze social media to provide insights for syndromic toxicosurveillance. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

36. Sentiment analysis methods for understanding large-scale texts: a case for using continuum-scored words and word shift graphs.

Author: Reagan, Andrew, Danforth, Christopher, Tivnan, Brian, Williams, Jake, and Dodds, Peter
Subjects: SENTIMENT analysis, HUMAN behavior, LEXICON, SOCIAL media, ACCURACY
Abstract: The emergence and global adoption of social media has rendered possible the real-time estimation of population-scale sentiment, an extraordinary capacity which has profound implications for our understanding of human behavior. Given the growing assortment of sentiment-measuring instruments, it is imperative to understand which aspects of sentiment dictionaries contribute to both their classification accuracy and their ability to provide richer understanding of texts. Here, we perform detailed, quantitative tests and qualitative assessments of 6 dictionary-based methods applied to 4 different corpora, and briefly examine a further 20 methods. We show that while inappropriate for sentences, dictionary-based methods are generally robust in their classification accuracy for longer texts. Most importantly they can aid understanding of texts with reliable and meaningful word shift graphs if (1) the dictionary covers a sufficiently large portion of a given text's lexicon when weighted by word usage frequency; and (2) words are scored on a continuous scale. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

37. Location detection and disambiguation from twitter messages.

Author: Inkpen, Diana, Liu, Ji, Farzindar, Atefeh, Kazemi, Farzaneh, and Ghazi, Diman
Subjects: INTERNET, ARTIFICIAL intelligence, MACHINE learning, DATA mining
Abstract: A remarkable amount of Twitter messages are generated every second. Detecting the location entities mentioned in these messages is useful in text mining applications. Therefore, techniques for extracting the location entities from the Twitter textual content are needed. In this work, we approach this task in a similar manner to the Named Entity Recognition (NER) task, but we focus only on locations, while NER systems detect names of persons, organizations, locations, and sometimes more (e.g., dates, times). But, unlike NER systems, we address a deeper task: classifying the detected locations into names of cities, provinces/states, and countries in order to map them into physical locations. We approach the task in a novel way, consisting in two stages. In the first stage, we train Conditional Random Fields (CRF) models that are able to detect the locations mentioned in the messages. We train three classifiers: one for cities, one for provinces/states, and one for countries, with various sets of features. Since a dataset annotated with this kind of information was not available, we collected and annotated our own dataset to use for training and testing. In the second stage, we resolve the remaining ambiguities, namely, cases when there exists more than one place with the same name. We proposed a set of heuristics able to choose the correct physical location in these cases. Our two-stage model will allow a social media monitoring system to visualize the places mentioned in Twitter messages on a map of the world or to compute statistics about locations. This kind of information can be of interest to business or marketing applications. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Journal

Region

Database

37 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources