7,304 results on '"lexicon"'
Search Results
2. The Morphological Processes Involved in the Lexicon of Iligan City’s Tambal Binisaya
- Author
-
Lourd Greggory D. Crisol
- Subjects
alternative medicine ,lexicon ,morphology ,folk customs ,material culture ,Social sciences (General) ,H1-99 ,Technology (General) ,T1-995 ,Business ,HF5001-6182 - Abstract
Tambal Binisaya refers to the folk medicine used in Iligan City, Philippines. These medicines have been used for decades by mananambals or local folk healers and residents from low-income groups. However, because of modern treatments, the locals have started to move away from Tambal Binisaya. Because of this, many residents have become unfamiliar to Tambal Binisaya and the terms used in the trade. Thus, this ethnolinguistic study looked into the lexicon of Tambal Binisaya, specifically word formation processes involved in the names of these folk medicines. In doing so, the researcher employed interviews, observations and field notes in data gathering. Based on the results of the study, it was found that the morphological processes used are: affixation, enclitization, reduplication, metanalysis, compounding, blending, borrowing and coining. It is concluded that there indeed lies a rich trove of lexical items in Tambal Binisaya which should be given more societal and academic attention.
- Published
- 2020
- Full Text
- View/download PDF
3. Lexicon Pharmaceuticals, Inc. Files SEC Form 10-Q, Quarterly Report [Sections 13 Or 15(D)]: (May 2, 2024).
- Subjects
QUARTERLY reports ,CORPORATE finance ,LEXICON ,UNAUDITED financial statements ,DRUGS - Abstract
Lexicon Pharmaceuticals, Inc. has filed a Form 10-Q with the U.S. Securities and Exchange Commission (SEC) on May 2, 2024. The company's SIC code is 2834, Pharmaceutical Preparations. The Form 10-Q is a quarterly report that includes unaudited financial statements and provides a view of the company's financial position. This filing is a formal document submitted by publicly-traded companies to the SEC. For more information, the SEC filing can be found at the provided link. [Extracted from the article]
- Published
- 2024
4. Lexicon Pharmaceuticals, Inc. Files SEC Form 10-K, Annual Report [Section 13 And 15(D), Not S-k Item 405]: (Mar. 25, 2024).
- Subjects
CORPORATION reports ,LEXICON ,PHARMACEUTICAL biotechnology industry ,DRUGS ,BIOTECHNOLOGY industries - Abstract
Lexicon Pharmaceuticals, Inc. has filed a Form 10-K with the U.S. Securities and Exchange Commission (SEC). The filing provides a comprehensive overview of the company's business and must be submitted within 90 days after the end of the fiscal year. Lexicon Pharmaceuticals is a biopharmaceutical company in the healthcare biotechnology industry, with a SIC code of 2834 for Pharmaceutical Preparations. The contact information for the company is located in The Woodlands, Texas. This news article provides additional information about the SEC filing and includes keywords such as Lexicon Pharmaceuticals Inc., Business, SEC Filing, Health and Medicine, Biopharmaceutical Companies, and Healthcare Biotechnology Companies. [Extracted from the article]
- Published
- 2024
5. Moroccan Arabic vocabulary generation using a rule-based approach
- Author
-
Ridouane Tachicart and Karim Bouzoubaa
- Subjects
Vocabulary ,General Computer Science ,Machine translation ,Computer science ,business.industry ,media_common.quotation_subject ,Concatenation ,Spell ,020206 networking & telecommunications ,Rule-based system ,02 engineering and technology ,Lexicon ,computer.software_genre ,Set (abstract data type) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,State (computer science) ,Artificial intelligence ,business ,computer ,Natural language processing ,media_common - Abstract
NLP resources play a crucial role in the building of many NLP applications. The importance of these resources depends not only on their size and coverage but also on the richness and the precision of the annotated information they provide. In the case of resource-scarce languages such as Moroccan Arabic, the building of NLP applications is limited due to the lack of these resources. To overcome this problem, we follow a rule-based approach to generate a Moroccan morphological vocabulary (MORV) which constitutes the first step addressing the problem of Moroccan morphological generation. MORV is designed and implemented based on two main components: On one hand, an MA lexicon and a list of fully annotated affixes and clitics that we have created specifically to ensure the generation process. On the other hand, a set of rules covering the concatenation and the orthographic adjustments of the generated words. Moreover, given a base form, MORV outputs more than 4.5 M Moroccan words with rich morphological features such as tense, gender, number, state, etc. We tested the coverage of MORV on texts collected from Moroccan social media and realized that it reaches a vocabulary coverage of 84% and a precision of 94%. This system is a benefit for building other NLP applications such as spell checking, morphological analysis, and machine translation.
- Published
- 2022
6. Offline Arabic handwritten word recognition: A transfer learning approach
- Author
-
Mohamed Awni, Hazem M. Abbas, and Mahmoud I. Khalil
- Subjects
General Computer Science ,Artificial neural network ,Computer science ,business.industry ,Deep learning ,computer.software_genre ,Lexicon ,Convolution ,Task (project management) ,ComputingMethodologies_PATTERNRECOGNITION ,Word recognition ,Artificial intelligence ,business ,Transfer of learning ,computer ,Natural language processing ,Word (computer architecture) - Abstract
Offline Arabic handwritten word recognition is still a challenging task. Many deep learning approaches perform admirably on this task if the lexicon size is not too large and the number of training samples is sufficient for the training process. The transfer learning technique is commonly used to compensate for the lack of training samples, but there is a wide controversy about the effectiveness of applying it to cross-domain tasks. In this paper, we examine the performance of three deep convolution neural networks that have been randomly initialized for recognizing Arabic handwritten words. Then, we evaluate the performance of the ResNet18 model that has been pre-trained on the ImageNet dataset for the same task. Finally, we propose an approach based on sequentially transferring the mid-level word image representations through two consecutive phases using the ResNet18 model. We carried out four different sets of experiments using two popular offline Arabic handwritten word datasets: the AlexU-W and the IFN/ENIT (v2.0p1e) to figure out the most effective way of applying transfer learning. Our results demonstrate that using the ImageNet as a source dataset improves the recognition accuracy of the ten frequently misclassified words in the IFN/ENIT dataset by 14%, while our proposed approach gives a rise of 35.45%. In the whole dataset, we achieved recognition accuracy up to 96.11%, which is nearly a 2.5% enhancement compared with other state-of-the-art methods.
- Published
- 2022
7. Attention-based position-aware framework for aspect-based opinion mining using bidirectional long short-term memory
- Author
-
Chetana Prakash and Azizkhan F Pathan
- Subjects
General Computer Science ,business.industry ,Computer science ,Deep learning ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Sentiment analysis ,Context (language use) ,computer.software_genre ,Lexicon ,SemEval ,Variety (cybernetics) ,Artificial intelligence ,business ,computer ,Natural language processing ,Word (computer architecture) ,Sentence - Abstract
Aspect-based Opinion Mining is a form of fine-grained Sentiment Analysis and it models the semasiological relationship between aspect terms and context words in a sentence. The presence of a variety of context words has a significant impact on a sentence's sentiment polarity. As a result, while designing a model, it is necessary to consider the interaction of aspects and context words. Although existing approaches have taken into account an aspect’s position in a sentence, much of the research works have not explored the use of Sentiment Lexicons with the Deep Learning algorithms. In this paper, we propose a framework for an Attention-based position-aware Bidirectional Long Short-Term Memory network for Aspect-based Opinion Mining that incorporates a Sentiment Intensity Lexicon. The aspect word’s pre-trained vector is adjusted to be closer to semantically and sentimentally similar nearest neighbors and further away from sentimentally dissimilar neighbors. The proposed framework calculates aspect weights by concatenating the external knowledge in the form of lexicon sentiment intensity scores with word embeddings and position information. The framework experiments on the SemEval 2014 dataset. The results of the experiments illustrate that injecting external knowledge into the Bidirectional Long Short-Term Memory network can improve classification accuracy significantly.
- Published
- 2022
8. Lexicon-Based Sentiment Convolutional Neural Networks for Online Review Analysis
- Author
-
Yanghui Rao, Haoran Xie, Yuwei Liu, Minghui Huang, Leonard K. M. Poon, and Fu Lee Wang
- Subjects
Computer science ,business.industry ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,Sentiment analysis ,02 engineering and technology ,Review analysis ,Lexicon ,computer.software_genre ,ComputingMethodologies_ARTIFICIALINTELLIGENCE ,Convolutional neural network ,Popularity ,Human-Computer Interaction ,Identification (information) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Contextual information ,020201 artificial intelligence & image processing ,Artificial intelligence ,InformationSystems_MISCELLANEOUS ,business ,computer ,Software ,Word (computer architecture) ,Natural language processing - Abstract
With the growing availability and popularity of sentiment-rich resources like blogs and online reviews, new opportunities and challenges have emerged regarding the identification, extraction, and organization of sentiments from user-generated documents or sentences. Recently, many studies have exploited lexicon-based methods or supervised learning algorithms to separately conduct sentiment analysis tasks; however, the former approaches ignore contextual information of sentences and the latter ones do not take sentiment information embedded in sentiment words into consideration. To tackle these limitations, we propose a new model named Sentiment Convolutional Neural Network (SentiCNN) to analyze the sentiments of sentences with both contextual and sentiment information of sentiment words, in which, contextual information is captured from word embeddings and sentiment information is identified using existing lexicons. We incorporate a Highway Network into our model to adaptively combine sentiment and contextual information from sentences by strengthening the connection between features of both sentences and their sentiment words. Furthermore, we propose three lexicon-based attention mechanisms (LBAMs) for our SentiCNN model to find the most important indicators of sentiments and make predictions more effectively. Experiments over two well-known datasets indicate that sentiment words, the Highway Network, and LBAMs contribute to sentiment analysis.
- Published
- 2022
9. Automated Classification of Societal Sentiments on Twitter With Machine Learning
- Author
-
Ganga Prasad Basyal, Piyush Vyas, Bhaskar P. Rimal, Prathamesh Muzumdar, Martin Reisslein, and Gitika Vyas
- Subjects
Artificial neural network ,Microblogging ,Computer science ,business.industry ,Information sharing ,Sentiment analysis ,General Medicine ,General Chemistry ,Machine learning ,computer.software_genre ,Lexicon ,Popularity ,Social media ,Artificial intelligence ,business ,F1 score ,computer - Abstract
The rapid growth in information sharing on social media has defined a new information age in our society. Microblogging sites, such as Twitter, gained immense popularity during the COVID-19 pandemic. We developed an automated framework to extract the positive, negative, and neutral sentiments from tweets, and to further classify the tweets through machine learning techniques. The developed framework can help to understand the sentiments in our society during profound events, such as the COVID-19 pandemic. Our framework is novel in that it is a hybrid framework that combines a lexicon-based technique for tweet sentiment analysis and labeling with supervised machine learning techniques for tweet classification. We have evaluated the hybrid framework with a range of measures, such as precision, accuracy, recall, and F1 score. Our results indicate that the majority of the sentiments are positive (38.5%) or neutral (34.7%). Furthermore, with an accuracy of 83%, the Long Short-Term Memory (LSTM) neural network has been selected as the preferred machine learning technique in the framework. The evaluation results indicate that our hybrid framework has the potential to automatically classify large tweet volumes, such as the tweets on COVID-19, according to the sentiments in the society.
- Published
- 2022
10. Two-Level LSTM for Sentiment Analysis With Lexicon Embedding and Polar Flipping
- Author
-
Mengyang Li, Tao Yang, Ming Li, and Ou Wu
- Subjects
Computer science ,Context (language use) ,Lexicon ,computer.software_genre ,Text mining ,Classifier (linguistics) ,Sentiment Analysis ,Data Mining ,Electrical and Electronic Engineering ,business.industry ,Sentiment analysis ,Semantics ,Computer Science Applications ,Human-Computer Interaction ,Control and Systems Engineering ,Benchmark (computing) ,Embedding ,Artificial intelligence ,business ,computer ,Algorithms ,Software ,Natural language processing ,Word (computer architecture) ,Sentence ,Information Systems - Abstract
Sentiment analysis is a key component in various text mining applications. Numerous sentiment classification techniques, including conventional and deep-learning-based methods, have been proposed in the literature. In most existing methods, a high-quality training set is assumed to be given. Nevertheless, constructing a high-quality training set that consists of highly accurate labels is challenging in real applications. This difficulty stems from the fact that text samples usually contain complex sentiment representations, and their annotation is subjective. We address this challenge in this study by leveraging a new labeling strategy and utilizing a two-level long short-term memory network to construct a sentiment classifier. Lexical cues are useful for sentiment analysis, and they have been utilized in conventional studies. For example, polar and negation words play important roles in sentiment analysis. A new encoding strategy, that is, ρ -hot encoding, is proposed to alleviate the drawbacks of one-hot encoding and, thus, effectively incorporate useful lexical cues. Moreover, the sentimental polarity of a word may change in different sentences due to label noise or context. A flipping model is proposed to model the polar flipping of words in a sentence. We compile three Chinese datasets on the basis of our label strategy and proposed methodology. Experiments demonstrate that the proposed method outperforms state-of-the-art algorithms on both benchmark English data and our compiled Chinese data.
- Published
- 2022
11. Aspect-Based Sentiment Analysis with New Target Representation and Dependency Attention
- Author
-
Ou Wu, Tao Yang, Qing Yin, and Lei Yang
- Subjects
Dependency (UML) ,business.industry ,Computer science ,Deep learning ,020208 electrical & electronic engineering ,Sentiment analysis ,Inference ,Context (language use) ,02 engineering and technology ,Lexicon ,computer.software_genre ,020202 computer hardware & architecture ,Human-Computer Interaction ,0202 electrical engineering, electronic engineering, information engineering ,Artificial intelligence ,business ,Representation (mathematics) ,computer ,Software ,Sentence ,Natural language processing - Abstract
Aspect-based sentiment analysis (ABSA) is crucial for exploring user feedbacks and preferences on produces or services. Although numerous classical deep learning-based methods have been proposed in previous literature, several useful cues (e.g., contextual, lexical, and syntactic) are still not fully considered and utilized. In this study, a new approach for ABSA is proposed through the guidance of contextual, lexical, and syntactic cues. First, a novel sub-network is introduced to represent a target in a sentence in ABSA by considering the whole context. Second, lexicon embedding is applied to incorporate additional lexical cues. Third, a new attention module, namely, dependency attention, is proposed to elaborate syntactic dependency cues between words in attention inference. Experimental results on four benchmark data sets demonstrate the effectiveness of our proposed approach to aspect-based sentiment analysis.
- Published
- 2022
12. Sociocultural factors during COVID-19 pandemic: Information consumption on Twitter
- Author
-
Leopoldo G. Arias-Bolzmann and Maximiliano Perez-Cepeda
- Subjects
Marketing ,Social network ,business.industry ,Twitter ,Internet privacy ,Sentiment analysis ,COVID-19 ,Lexicon ,Netnography ,Article ,Social group ,Categorization ,Pandemic ,Social media ,Sociology ,Sociocultural factors ,business ,Sociocultural evolution - Abstract
The purpose of the research is to describe the sociocultural factors that emerged during the COVID-19 pandemic. Twitter is used as an instrument for data collection. The study is qualitative and uses the netnographic method. To analyze the flow of messages posted on Twitter, the model proposed by Perez-Cepeda and Arias-Bolzmann (2020) , which describes sociocultural factors, is taken as a basis. The semantics that people use are a type of functional knowledge that reveals sociocultural factors. Sentiments were analyzed through lexicon-based methods, which are the most suitable. The categorization and classification of the data are performed based on the information that users post on Twitter. The tweets related to COVID-19 describe the sociocultural issues and the level of sentiment around the pandemic. The discussion centers on the COVID-19 pandemic, information consumption, lexicon, sociocultural factors and sentiment analysis. The study was limited to the social media Twitter; another limitation was not to consider the social group of the users who interact with @pandemic_Covid-19, official account of the World Health Organization (WHO). This research contributes to the social sciences, focusing on sociocultural interaction through the use of the social network Twitter. It describes the link between sociocultural factors and the level of sentiment on issues related to the COVID-19 pandemic.
- Published
- 2022
13. Enriching the transfer learning with pre-trained lexicon embedding for low-resource neural machine translation
- Author
-
Mieradilijiang Maimaiti, Maosong Sun, Yang Liu, and Huanbo Luan
- Subjects
Multidisciplinary ,Artificial neural network ,Machine translation ,Computer science ,business.industry ,Translation (geometry) ,computer.software_genre ,Lexicon ,Feature (machine learning) ,Embedding ,Artificial intelligence ,business ,Transfer of learning ,computer ,Word (computer architecture) ,Natural language processing - Abstract
Most State-Of-The-Art (SOTA) Neural Machine Translation (NMT) systems today achieve outstanding results based only on large parallel corpora. The large-scale parallel corpora for high-resource languages is easily obtainable. However, the translation quality of NMT for morphologically rich languages is still unsatisfactory, mainly because of the data sparsity problem encountered in Low-Resource Languages (LRLs). In the low-resource NMT paradigm, Transfer Learning (TL) has been developed into one of the most efficient methods. It is difficult to train the model on high-resource languages to include the information in both parent and child models, as well as the initially trained model that only contains the lexicon features and word embeddings of the parent model instead of the child languages feature. In this work, we aim to address this issue by proposing the language-independent Hybrid Transfer Learning (HTL) method for LRLs by sharing lexicon embedding between parent and child languages without leveraging back translation or manually injecting noises. First, we train the High-Resource Languages (HRLs) as the parent model with its vocabularies. Then, we combine the parent and child language pairs using the oversampling method to train the hybrid model initialized by the previously parent model. Finally, we fine-tune the morphologically rich child model using a hybrid model. Besides, we explore some exciting discoveries on the original TL approach. Experimental results show that our model consistently outperforms five SOTA methods in two languages Azerbaijani (Az) and Uzbek (Uz). Meanwhile, our approach is practical and significantly better, achieving improvements of up to 4.94 and 4.84 BLEU points for low-resource child languages Az ! Zh and Uz ! Zh, respectively.
- Published
- 2022
14. Neurocomputational Models of Language Processing
- Author
-
John Hale, Jonathan Brennan, Shohini Bhattasali, Jixing Li, Christophe Pallier, and Luca Campanelli
- Subjects
Linguistics and Language ,Sequence ,Computational model ,Parsing ,Computer science ,business.industry ,Deep learning ,Contrast (statistics) ,Variety (linguistics) ,computer.software_genre ,Lexicon ,Language and Linguistics ,Neurolinguistics ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
Efforts to understand the brain bases of language face the Mapping Problem: At what level do linguistic computations and representations connect to human neurobiology? We review one approach to this problem that relies on rigorously defined computational models to specify the links between linguistic features and neural signals. Such tools can be used to estimate linguistic predictions, model linguistic features, and specify a sequence of processing steps that may be quantitatively fit to neural signals collected while participants use language. Progress has been helped by advances in machine learning, attention to linguistically interpretable models, and openly shared data sets that allow researchers to compare and contrast a variety of models. We describe one such data set in detail in the Supplemental Appendix .
- Published
- 2022
15. A Novel Sentiment Polarity Detection Framework for Chinese
- Author
-
Huan Rong, Yuan Tian, Tinghuai Ma, Mznah Al-Rodhaan, Jie Cao, and Yongsheng Hao
- Subjects
Computer science ,Polarity (physics) ,business.industry ,Perspective (graphical) ,020206 networking & telecommunications ,02 engineering and technology ,computer.software_genre ,Lexicon ,Human-Computer Interaction ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,020201 artificial intelligence & image processing ,Artificial intelligence ,Pruning (decision trees) ,Overall performance ,Representation (mathematics) ,business ,computer ,Software ,Word (computer architecture) ,Natural language processing - Abstract
Nowadays, mining opinion or sentiment from the online user-generated texts has become a research hot-spot. Although, a large amount of lexicon-based Chinese polarity detection works have been done, yet the existing methods have one common flaw that even the same word can have opposite polarities among different seed-lexicons, or polarity fuzziness. Therefore, in order to further enhance the performance of Chinese sentiment polarity detection, in this paper, we start from a two-aspect lexicon expansion, specifically, detecting sentiment polarity for new words and revising sentiment polarity for words already defined in seed-lexicons, so that the polarity fuzziness can be avoided. Then, we formulate a novel sentiment polarity detection framework for Chinese (SPDFC) with more effort to fine-grained sentiment processing, involved in symmetrical mapping, sentiment feature pruning and text representation. In this way, the word polarity can be directly taken as features, further penetrating into polarity detection phase. According to the experimental results, compared to other classical and state-of-the-art methods, the proposed the framework SPDFC can achieve the best overall performance from the perspective of Chinese polarity detection, sentiment feature pruning as well as text representation.
- Published
- 2022
16. DepecheMood++: A Bilingual Emotion Lexicon Built Through Simple Yet Powerful Techniques
- Author
-
Lorenzo Gatti, Oscar Araque, Marco Guerini, Jacopo Staiano, and Human Media Interaction
- Subjects
FOS: Computer and information sciences ,Emotion lexicon ,Vocabulary ,Computer science ,Emotion analysis ,media_common.quotation_subject ,02 engineering and technology ,Lexicon ,computer.software_genre ,Computer Science - Computers and Society ,Simple (abstract algebra) ,Computers and Society (cs.CY) ,Machine learning ,0202 electrical engineering, electronic engineering, information engineering ,media_common ,Block (data storage) ,Computer Science - Computation and Language ,business.industry ,Natural language processing ,05 social sciences ,Sentiment analysis ,Human-Computer Interaction ,Sadness ,Word embeddings ,Happiness ,020201 artificial intelligence & image processing ,Artificial intelligence ,0509 other social sciences ,050904 information & library sciences ,business ,Computation and Language (cs.CL) ,computer ,Software ,Word (computer architecture) - Abstract
Several lexica for sentiment analysis have been developed and made available in the NLP community. While most of these come with word polarity annotations (e.g. positive/negative), attempts at building lexica for finer-grained emotion analysis (e.g. happiness, sadness) have recently attracted significant attention. Such lexica are often exploited as a building block in the process of developing learning models for which emotion recognition is needed, and/or used as baselines to which compare the performance of the models. In this work, we contribute two new resources to the community: a) an extension of an existing and widely used emotion lexicon for English; and b) a novel version of the lexicon targeting Italian. Furthermore, we show how simple techniques can be used, both in supervised and unsupervised experimental settings, to boost performances on datasets and tasks of varying degree of domain-specificity., 12 pages, 2 figures
- Published
- 2022
17. Comparative analysis of machine learning-based classification models using sentiment classification of tweets related to COVID-19 pandemic
- Author
-
Kamal Gulati, S. Saravana Kumar, Raja Sarath Kumar Boddu, Ketan Sarvakar, Dilip Kumar Sharma, and M.Z.M. Nomani
- Subjects
010302 applied physics ,Data collection ,business.industry ,Computer science ,Bigram ,Sentiment analysis ,02 engineering and technology ,021001 nanoscience & nanotechnology ,Perceptron ,Lexicon ,Machine learning ,computer.software_genre ,01 natural sciences ,ComputingMethodologies_PATTERNRECOGNITION ,0103 physical sciences ,Classifier (linguistics) ,Trigram ,AdaBoost ,Artificial intelligence ,0210 nano-technology ,business ,computer - Abstract
Sentiment Analysis (SA) is the area of research to find useful information using the sentiments of people shared on social networking platforms like Twitter, Facebook, etc. Such kinds of analysis are useful to make classification of sentiments as positive, negative, or neutral. The process of classification of sentiments can be done with the help of a traditional lexicon-based approach or machine learning techniques-based approach. In this research paper, we are presenting a comparative analysis of popular machine learning-based classifiers. We have made experimentations using the tweet datasets related to the COVID-19 epidemic. We have used seven machine learning-based classifiers. These classifiers are applied to more than 72,000 tweets related to COVID-19. We have performed experimentations using three modes i.e. Unigram, Bigram, and Trigram. As per the results, Linear SVC, Perceptron, Passive Aggressive Classifier, and Logistic Regression able to achieve more than 98% maximum accuracy score in classification (unigram, bigram, trigram) and are very close to each other in terms of performance. The average accuracy achieved by Linear SVC, Perceptron, Passive Aggressive Classifier, and Logistic Regression are 0.981573613, 0.976506357, 0.981573613, and 0.976690621. Ada Boost Classifier performs worst among all other classifiers with 0.731435416 average accuracies. The details regarding data collection, experimentations, and results are presented in the research paper.
- Published
- 2022
18. Pseudo-siamese networks with lexicon for Chinese short text matching
- Author
-
Chiyu Wang, Jiawen Shi, Jiale Zhou, Zhicheng Pang, and Hong Li
- Subjects
Statistics and Probability ,Artificial Intelligence ,Computer science ,business.industry ,Text matching ,General Engineering ,Artificial intelligence ,computer.software_genre ,business ,Lexicon ,computer ,Natural language processing - Abstract
Short text matching is one of the fundamental technologies in natural language processing. In previous studies, most of the text matching networks are initially designed for English text. The common approach to applying them to Chinese is segmenting each sentence into words, and then taking these words as input. However, this method often results in word segmentation errors. Chinese short text matching faces the challenges of constructing effective features and understanding the semantic relationship between two sentences. In this work, we propose a novel lexicon-based pseudo-siamese model (CL2 N), which can fully mine the information expressed in Chinese text. Instead of utilizing a character-sequence or a single word-sequence, CL2 N augments the text representation with multi-granularity information in characters and lexicons. Additionally, it integrates sentence-level features through single-sentence features as well as interactive features. Experimental studies on two Chinese text matching datasets show that our model has better performance than the state-of-the-art short text matching models, and the proposed method can solve the error propagation problem of Chinese word segmentation. Particularly, the incorporation of single-sentence features and interactive features allows the network to capture the contextual semantics and co-attentive lexical information, which contributes to our best result.
- Published
- 2021
19. COVID-19 Lexicon in English News Reports Based on the Theory of Semantic Field
- Author
-
Mengxi Wu
- Subjects
Coronavirus disease 2019 (COVID-19) ,Computer science ,business.industry ,Artificial intelligence ,Lexicon ,Semantic field ,computer.software_genre ,business ,computer ,Natural language processing - Abstract
Coronavirus disease, or simply COVID-19, has affected many regions worldwide. The pandemic has caused great losses from all walks of life. Millions of people have died from the virus. In order to facilitate people’s understanding of COVID-19, the present study adopts the theory of semantic field to analyze the COVID-19 lexicon that appeared in China Daily, an authoritative international daily newspaper issued by China. A total of 100 pieces of English news issued by China Daily have been randomly selected for this research. According to the theory of semantic field in structural linguistics, the meaning of a word cannot stand alone, but come into being with the meanings of its related words. Therefore, it is reasonable to try to understand COVID-19 as thoroughly as possible with relevant words, which form its semantic field.
- Published
- 2021
20. Ovarian cancer reporting lexicon for computed tomography (CT) and magnetic resonance (MR) imaging developed by the SAR Uterine and Ovarian Cancer Disease-Focused Panel and the ESUR Female Pelvic Imaging Working Group
- Author
-
Priyanka Jha, Atul B. Shinagare, Liina Poder, Elizabeth A. Sadowski, Jeanne M. Horowitz, H. A. Vargas, Marcia C. Javitt, Lucia Manganaro, Aki Kido, Katherine E. Maturen, Andrea Rockall, Hye Sun Park, Yulia Lakhman, Olivera Nikolic, Stephanie Nougaret, Isabelle Thomassin-Naggara, Gaiane M. Rauch, Evis Sala, Neil S. Horowitz, Rosemarie Forstner, Olga R. Brook, Susanna I. Lee, Aradhana M. Venkatesan, Caroline Reinhold, and S. Wallace
- Subjects
Ovarian Neoplasms ,medicine.medical_specialty ,Isoflurophate ,Magnetic Resonance Spectroscopy ,medicine.diagnostic_test ,business.industry ,Magnetic resonance imaging ,Interventional radiology ,General Medicine ,Gynecologic oncology ,Disease ,medicine.disease ,Lexicon ,Magnetic Resonance Imaging ,Article ,Humans ,Medicine ,Female ,Radiology, Nuclear Medicine and imaging ,Radiology ,Tomography, X-Ray Computed ,business ,Ovarian cancer ,Radiation treatment planning ,Neuroradiology - Abstract
Objectives Imaging evaluation is an essential part of treatment planning for patients with ovarian cancer. Variation in the terminology used for describing ovarian cancer on computed tomography (CT) and magnetic resonance (MR) imaging can lead to ambiguity and inconsistency in clinical radiology reports. The aim of this collaborative project between Society of Abdominal Radiology (SAR) Uterine and Ovarian Cancer (UOC) Disease-focused Panel (DFP) and the European Society of Uroradiology (ESUR) Female Pelvic Imaging (FPI) Working Group was to develop an ovarian cancer reporting lexicon for CT and MR imaging. Methods Twenty-one members of the SAR UOC DFP and ESUR FPI working group, one radiology clinical fellow, and two gynecologic oncology surgeons formed the Ovarian Cancer Reporting Lexicon Committee. Two attending radiologist members of the committee prepared a preliminary list of imaging terms that was sent as an online survey to 173 radiologists and gynecologic oncologic physicians, of whom 67 responded to the survey. The committee reviewed these responses to create a final consensus list of lexicon terms. Results An ovarian cancer reporting lexicon was created for CT and MR Imaging. This consensus-based lexicon has 6 major categories of terms: general, adnexal lesion-specific, peritoneal carcinomatosis-specific, lymph node-specific, metastatic disease -specific, and fluid-specific. Conclusions This lexicon for CT and MR imaging evaluation of ovarian cancer patients has the capacity to improve the clarity and consistency of reporting disease sites seen on imaging. Key points • This reporting lexicon for CT and MR imaging provides a list of consensus-based, standardized terms and definitions for reporting sites of ovarian cancer on imaging at initial diagnosis or follow-up. • Use of standardized terms and morphologic imaging descriptors can help improve interdisciplinary communication of disease extent and facilitate optimal patient management. • The radiologists should identify and communicate areas of disease, including difficult to resect or potentially unresectable disease that may limit the ability to achieve optimal resection.
- Published
- 2021
21. Neuro-fuzzy network incorporating multiple lexicons for social sentiment analysis
- Author
-
Seba Susan and Srishti Vashishtha
- Subjects
Neuro-fuzzy ,Computer science ,Application of Soft Computing ,Computational intelligence ,Machine learning ,computer.software_genre ,Neuro-fuzzy network ,Fuzzy logic ,Theoretical Computer Science ,Social media ,Set (abstract data type) ,Sentiment analysis ,ANFIS ,Lexicon ,Tweets ,Adaptive neuro fuzzy inference system ,business.industry ,Deep learning ,ComputingMethodologies_PATTERNRECOGNITION ,Geometry and Topology ,Artificial intelligence ,business ,computer ,Software ,Natural language - Abstract
We have proposed MultiLexANFIS which is an adaptive neuro-fuzzy inference system (ANFIS) that incorporates inputs from multiple lexicons to perform sentiment analysis of social media posts. We classify tweets into two classes: neutral and non-neutral; the latter class includes both positive and negative polarity. This type of classification will be considered for applications that aim to test the neutrality of content posted by the users in social media platforms. In our proposed model, features are extracted by integrating natural language processing with fuzzy logic; hence, it is able to deal with the fuzziness of natural language in a very efficient and automatic manner. We have proposed a novel set of 64 rules for the proposed neuro-fuzzy network that can classify tweets correctly by working on fuzzy features fetched from VADER, AFINN and SentiWordNet lexicons. The proposed novel rules are domain independent, i.e., we can extend these rules for any textual data that employs lexicons. The antecedent and consequent parameters of the ANFIS are optimized by gradient descent and least squares estimate algorithms, respectively, in an iterative manner. The key contributions of this paper are: (1) a novel neuro-fuzzy system: MultiLexANFIS that takes as its input the positive and negative sentiment scores of tweets computed from multiple lexicons—VADER, AFINN and SentiWordNet, in order to classify the tweets into neutral and non-neutral content, (2) a novel set of 64 rules for the Sugeno-type fuzzy inference system—MultiLexANFIS, (3) single-lexicon-based ANFIS variants to classify tweets when multiple lexicons are not available and (4) comparison of MultiLexANFIS with different fuzzy, non-fuzzy and deep learning state of the art on various benchmark datasets revealing the superiority of our proposed neuro-fuzzy system for social sentiment analysis.
- Published
- 2021
22. Sentiment Analysis on Government Performance in Tourism During The COVID-19 Pandemic Period With Lexicon Based
- Author
-
Adri Priadana and Ahmad Ashril Rizal
- Subjects
Government ,Naive Bayes classifier ,Sentiment analysis ,Confusion matrix ,Social media ,General Medicine ,Business ,Marketing ,Lexicon ,Period (music) ,Tourism - Abstract
The COVID-19 pandemic impact has affected all industries in Indonesia and even the world, including the tourism industry. Researchers have a role in researching to answer the needs of the tourism industry, especially in making tourism and business destination management programs and carrying out activities oriented to meet the needs of the tourism industry. Meanwhile, the government has a role in making policies, especially in the roadmap, for developing the tourism industry. This study aims to track trending topics in social media Instagram since COVID-19 hit. The results of trending topics will be classified by sentiment analysis using a Lexicon-based and Naive Bayes Classifier. Based on Instagram data taken since January 2020, it shows the five highest topics in the tourism sector, namely health protocols, hotels, homes, streets, and beaches. Of the five topics, sentiment analysis was carried out with the Lexicon-based and Naive Bayes classifier, showing that beaches get an incredibly positive sentiment, namely 80.87%, and hotels provide the highest negative sentiment 57.89%. The accuracy of the Confusion matrix's sentiment results shows that the accuracy, precision, and recall are 82.53%, 86.99%, and 83.43%, respectively.
- Published
- 2021
23. Context-sensitive lexicon for imbalanced text sentiment classification using bidirectional LSTM
- Author
-
M. R. Pavan Kumar and Prabhu Jayagopal
- Subjects
Computer science ,business.industry ,Context (language use) ,Subject (documents) ,Variety (linguistics) ,Semantics ,Lexicon ,computer.software_genre ,Industrial and Manufacturing Engineering ,Task (project management) ,Resource (project management) ,Artificial Intelligence ,Oversampling ,Artificial intelligence ,business ,computer ,Software ,Natural language processing - Abstract
Sentiment lexicon is a reliable resource in computing sentiment classification. However, a general purpose lexicon alone is not sufficient, since text sentiment classification is perceived as a context-dependent task in the literature. On the contrary, we observe that many people tend to imitate others while writing reviews. As such, the subject of all the public opinion towards an entity ends up as an imbalanced corpus. In this paper, we intend to induce a context-based lexicon as a resource to explore imbalanced text sentiment classification. This method addresses the above mentioned two critical problems in text sentiment classification. First, it identifies subjective words relative to the context and computes the weight scores for subjective terms and full review. Also, in recent years, the application of RNNs to a variety of problems has been incredible, especially in natural language processing tasks. Thus, we take the advantages of the context-based lexicon as well as a bidirectional LSTM to handle text sentiment classification. Second, it deals imbalanced data by deploying a text based oversampling method for creating new synthetic text samples. The reason behind using a text based oversampling method is to make use of semantics of the information while creating new text samples. Experimental results prove that leveraging sentiment lexicon relative to the context and application of Bidiricetional LSTM with text based oversampling is useful in imbalanced text sentiment classification and in achieving state-of-the-art results over deep neural learning model baselines.
- Published
- 2021
24. UNDERSTANDING CONSTRUCTION SITE SAFETY HAZARDS THROUGH OPEN DATA: TEXT MINING APPROACH
- Author
-
Neththi Kumara Appuhamilage Heshani Rupasinghe and Kriengsak Panuwatwanich
- Subjects
Iterative and incremental development ,Environmental Engineering ,business.industry ,Computer science ,General Chemical Engineering ,General Engineering ,Energy Engineering and Power Technology ,Geotechnical Engineering and Engineering Geology ,Lexicon ,Machine learning ,computer.software_genre ,Computer Science Applications ,Construction site safety ,Random forest ,Support vector machine ,Accident (fallacy) ,Open data ,Artificial intelligence ,business ,computer ,Classifier (UML) - Abstract
Construction is an industry well known for its very high rate of injuries and accidents around the world. Even though many researchers are engaged in analysing the risks of this industry using various techniques, construction accidents still require much attention in safety science. According to existing literature, it has been found that hazards related to workers, technology, natural factors, surrounding activities and organisational factors are primary causes of accidents. Yet, there has been limited research aimed to ascertain the extent of these hazards based on the actual reported accidents. Therefore, the study presented in this paper was conducted with the purpose of devising an approach to extract sources of hazards from publicly available injury reports by using Text Mining (TM) and Natural Language Processing (NLP) techniques. This paper presents a methodology to develop a rule-based extraction tool by providing full details of lexicon building, devising extraction rules and the iterative process of testing and validation. In addition, the developed rule-based classifier was compared with, and found to outperform, the existing statistical classifiers such as Support Vector Machine (SVM), Kernel SVM, K-nearest neighbours, Naïve Bayesian classifier and Random Forest classifier. The finding using the developed tool identified the worker factor as the highest contributor to construction site accidents followed by technological factor, surrounding activities, organisational factor, and natural factor (1%). The developed tool could be used to quickly extract the sources of hazards by converting largely available unstructured digital accident data to structured attributes allowing better data-driven safety management.
- Published
- 2021
25. Application of Optimization and Machine Learning for Sentiment Analysis
- Author
-
Devendra Singh Rathod and Manitosh Chourasiya
- Subjects
Phrase ,business.industry ,Computer science ,Process (engineering) ,Scale (chemistry) ,Sentiment analysis ,Lexicon ,Machine learning ,computer.software_genre ,Automation ,Ensemble learning ,Artificial intelligence ,business ,computer ,Sentence - Abstract
Sentiment analysis is called detecting emotions extracted from text features and is known as one of the most important parts of opinion extraction. Through this process, we can determine if a script is positive, negative or neutral. In this research, sentiment analysis is performed with textual data. A text feeling analyzer combines natural language processing (NLP) and machine learning techniques to assign weighted assessment scores to entities, subjects, subjects, and categories within a sentence or phrase. In expressing mood, the polarity of text reviews could be graded on a negative to positive scale using a learning algorithm. The current decade has seen significant developments in artificial intelligence, and the machine learning revolution has changed the entire AI industry. After all, machine learning techniques have become an integral part of any model in today's computing world. However, the ensemble to learning techniques is promise a high level of automation with the extraction of generalized rules for text and sentiment classification activities. This thesis aims to design and implement an optimized functionality matrix using to the ensemble learning for the sentiment classification and its applications.
- Published
- 2021
26. Sentiment Classification for Film Reviews by Reducing Additional Introduced Sentiment Bias
- Author
-
Fery Ardiansyah Effendi and Yuliant Sibaroni
- Subjects
Hyperparameter ,business.industry ,Computer science ,Polarity (physics) ,Information technology ,T58.5-58.64 ,computer.software_genre ,Lexicon ,Systems engineering ,TA168 ,Preprocessor ,Classification methods ,Artificial intelligence ,Model configuration ,Sentiment Classification, Machine Learning, ANN, Lexicon-based method, BAT, SO-Cal ,business ,Klasifikasi Sentimen, Pembelajaran Mesin, ANN, Metode Lexicon-based, BAT, SO-Cal ,computer ,Row ,Word (computer architecture) ,Natural language processing - Abstract
Film business and its individual reviews cannot be separated and film review sites such as IMDb is a credible source of reviews posted in public forums. With IMDb site reviews being unstructured and bias-heavy, classification methods by reducing additional sentiment bias is needed to create a balanced classification with lower polarity bias. Elimination of additional sentiment bias will improve the model as polarity is defined by non-bias method, resulting in models correctly defined which sequences of words is either positive or negative. This research limits the dataset by 50.000 rows of randomly extracted reviews from the IMDb website using dataset preparation methods such as Preprocessing, POS-Tagging, and Word Embeddings. Then preprocessed data is used in classification methods such as ANN, SWN, and SO-Cal. This paper also used bias processing methods such as Hyperparameter Tuning and BPM, with outputs evaluated using Accuracy and PBR metrics. This research yields 77.39 % for ANN, 66.32% for BPM, 75.6% for SO-Cal, and 76.26% for Hybrid classification. Best PBR resulted in two lexicon-based methods on 0.0009 for BPM, and 0.00006 for SO-Cal. More advanced model configuration in ANN can improve the model, and much complex lexicon models will be a future in the research topic.  , Film business and its individual reviews cannot be separated and film review sites such as IMDb is a credible source of reviews posted in public forums. With IMDb site reviews being unstructured and bias-heavy, classification methods by reducing additional sentiment bias is needed to create a balanced classification with lower polarity bias. Elimination of additional sentiment bias will improve the model as polarity is defined by non-bias method, resulting in models correctly defined which sequences of words is either positive or negative. This research limits the dataset by 50.000 rows of randomly extracted reviews from the IMDb website using dataset preparation methods such as Preprocessing, POS-Tagging, and Word Embeddings. Then preprocessed data is used in classification methods such as ANN, SWN, and SO-Cal. This paper also used bias processing methods such as Hyperparameter Tuning and BPM, with outputs evaluated using Accuracy and PBR metrics. This research yields 77.39 % for ANN, 66.32% for BPM, 75.6% for SO-Cal, and 76.26% for Hybrid classification. Best PBR resulted in two lexicon-based methods on 0.0009 for BPM, and 0.00006 for SO-Cal. More advanced model configuration in ANN can improve the model, and much complex lexicon models will be a future in the research topic.
- Published
- 2021
27. EmoChannel-SA: exploring emotional dependency towards classification task with self-attention mechanism
- Author
-
Haoran Xie, Qing Li, Zongxi Li, Xiaohui Tao, Gary Cheng, and Xinhong Chen
- Subjects
Dependency (UML) ,Computer Networks and Communications ,business.industry ,Computer science ,Emotion classification ,Sentiment analysis ,Lexicon ,computer.software_genre ,Task (project management) ,Hardware and Architecture ,Emotional dependency ,Domain knowledge ,Artificial intelligence ,business ,computer ,Software ,Natural language processing ,Sentence - Abstract
Exploiting hand-crafted lexicon knowledge to enhance emotional or sentimental features at word-level has become a widely adopted method in emotion-relevant classification studies. However, few attempts have been made to explore the emotion construction in the classification task, which provides insights to how a sentence’s emotion is constructed. The major challenge of exploring emotion construction is that the current studies assume the dataset labels as relatively independent emotions, which overlooks the connections among different emotions. This work aims to understand the coarse-grained emotion construction and their dependency by incorporating fine-grained emotions from domain knowledge. Incorporating domain knowledge and dimensional sentiment lexicons, our previous work proposes a novel method namedEmoChannelto capture the intensity variation of a particular emotion in time series. We utilize the resultant knowledge of 151 available fine-grained emotions to comprise the representation of sentence-level emotion construction. Furthermore, this work explicitly employs a self-attention module to extract the dependency relationship within all emotions and proposeEmoChannel-SANetwork to enhance emotion classification performance. We conducted experiments to demonstrate that the proposed method produces competitive performances against the state-of-the-art baselines on both multi-class datasets and sentiment analysis datasets.
- Published
- 2021
28. Improving Attention Model Based on Cognition Grounded Data for Sentiment Analysis
- Author
-
Minglei Li, Qin Lu, Yunfei Long, Chu-Ren Huang, and Rong Xiang
- Subjects
Computer science ,business.industry ,media_common.quotation_subject ,05 social sciences ,Sentiment analysis ,Cognition ,Context (language use) ,02 engineering and technology ,Lexicon ,computer.software_genre ,Preference ,Human-Computer Interaction ,Reading (process) ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Artificial intelligence ,0509 other social sciences ,050904 information & library sciences ,business ,computer ,Software ,Word (computer architecture) ,Natural language processing ,Sentence ,media_common - Abstract
Attention models are proposed in sentiment analysis and other classification tasks because some words are more important than others to train the attention models. However, most existing methods either use local context based information, affective lexicons, or user preference information. In this work, we propose a novel attention model trained by cognition grounded eye-tracking data. First,a reading prediction model is built using eye-tracking data as dependent data and other features in the context as independent data. The predicted reading time is then used to build a cognition grounded attention layer for neural sentiment analysis. Our model can capture attentions in context both in terms of words at sentence level as well as sentences at document level. Other attention mechanisms can also be incorporated together to capture other aspects of attentions, such as local attention, and affective lexicons. Results of our work include two parts. The first part compares our proposed cognition ground attention model with other state-of-the-art sentiment analysis models. The second part compares our model with an attention model based on other lexicon based sentiment resources. Evaluations show that sentiment analysis using cognition grounded attention model outperforms the state-of-the-art sentiment analysis methods significantly. Comparisons to affective lexicons also indicate that using cognition grounded eye-tracking data has advantages over other sentiment resources by considering both word information and context information. This work brings insight to how cognition grounded data can be integrated into natural language processing (NLP) tasks.
- Published
- 2021
29. Extending Coverage of a Lexicon of Discourse Connectives Using Annotation Projection
- Author
-
Lucie Poláková, Jirí Mírovský, and Pavlína Synková
- Subjects
Annotation ,business.industry ,Computer science ,Artificial intelligence ,computer.software_genre ,Projection (set theory) ,business ,Lexicon ,Discourse connectives ,computer ,Natural language processing - Published
- 2021
30. Generating emotional response by conditional variational auto-encoder in open-domain dialogue system
- Author
-
Yuchen Shen, Bao Xiaoming, Liu Mengjuan, Jiang Liu, and Zhao Pei
- Subjects
Syntax (programming languages) ,Computer science ,business.industry ,Cognitive Neuroscience ,Rank (computer programming) ,Context (language use) ,computer.software_genre ,Semantics ,Lexicon ,Chatbot ,Computer Science Applications ,Ranking (information retrieval) ,Artificial Intelligence ,Relevance (information retrieval) ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
An important goal in open-domain dialogue research is to make chatbot generate emotional responses given a context. To achieve this, some researchers have attempted to introduce affective information into neural dialogue models. However, these neural dialogue models containing affective information still suffer from the problem of generating safe but meaningless responses, such as I don’t know, which makes users lose interest in chatting quickly. Fortunately, the latest research has proven that conditional variational auto-encoder (CVAE) can solve this problem and enhance the responses’ diversity. In this paper, we combine affective knowledge into the CVAE-based model to generate diverse and affective responses. First, we use an affective lexicon to understand each word’s emotion in the input sentences and feed the affective vector with its embedding vector together into the CVAE-based model. Next, we construct semantic and affective loss functions, enabling the model to simultaneously learn the response’s semantic and affective distributions. Additionally, we formulate a ranking rule to help rank the candidate responses according to their syntax, semantics, and affection scores, thereby enhancing the emotion and relevance while retaining the response’s diversity. Finally, we evaluate the proposed model on the DailyDialog dataset and Reddit dataset. The experimental results show that our model can generate more emotional, diverse, and context-relevant responses compared to the baselines.
- Published
- 2021
31. How archival studies and knowledge management practitioners describe the value of research: assessing the 'quiet' archivist persona
- Author
-
Jennifer Y. Pearson
- Subjects
Value (ethics) ,History ,Knowledge management ,business.industry ,Discourse analysis ,Persona ,Library and Information Sciences ,Lexicon ,Archivist ,Archival science ,Leverage (negotiation) ,Sociology ,Set (psychology) ,business - Abstract
The archivist persona is frequently described in terms of passive, introverted attributes, which are then viewed as contributing to critical concerns for the sector, such as a lack of visibility, perceived effectiveness, and funding. This study is the first to assess the archivist persona through a discourse analysis, examining the usage of words promoting value and positive benefits in archival studies publications. Titles and abstracts from research articles published in five prominent journals between 2015 and 2019 were analysed for a set of 57 words connoting value or valuable benefits, including terms such as “innovative”, “positive”, and “strategic.” An identical analysis of research articles published in five knowledge management (KM) publications over the same timeframe was also completed in order to provide a comparative dataset from an adjacent, yet more corporate-embedded information practice. The results demonstrate that archival studies researchers use value words to promote the benefits of their research, but do so at a significantly lower frequency and density when compared to KM. A qualitative analysis of the results shows that archivists leverage a passive lexicon to promote value and benefits, relying on generic adjectives and indirect claims, whereas the lexicon of KM communicates direct, actionable outcomes that more readily align with business stakeholders’ priorities. These findings suggest practical communications recommendations for the archives sector, which could enhance business stakeholders’ perceptions of archivists and the value of archival work.
- Published
- 2021
32. Beethoven im 'Brockhaus'
- Author
-
Arnold Jacobshagen
- Subjects
Literature ,History ,Middle class ,business.industry ,media_common.quotation_subject ,Musical ,Lexicon ,Eleventh ,Reading (process) ,Encyclopedia ,Narrative ,business ,Music ,media_common - Abstract
Full-length biographies about Ludwig van Beethoven were not published until after the composer's death. During his lifetime, biographical articles in dictionaries and encyclopaedias were therefore a particularly important source of information, since general encyclopaedias achieved a much wider circulation than specialist music publications. The first entry on Beethoven appeared as early as 1790 in Ernst Ludwig Gerber's Historisch-Biographisches Lexicon der Tonkünstler. The most widely read encyclopaedia for the educated middle class was the Conversations-Lexicon oder enzyklopädisches Handwörterbuch für gebildete Stände, first published by Brockhaus in 1809. This paper comparatively examines the articles on Beethoven from the first decades of the 19th century until the eleventh edition of 1863 and with regard to the emergence of typical narratives. It is noteworthy that the early entries on Beethoven, were shorter than those for other contemporary composers, contained false biographic information and were reluctant in their assessment of Beethoven’s oeuvre. This only changes after the composer’s death and raises the question whether, in the eyes of the general reading public, Beethoven really was the predominant musical figure in the first decades of the nineteenth century.
- Published
- 2021
33. A Multi-Classification Sentiment Analysis Model of Chinese Short Text Based on Gated Linear Units and Attention Mechanism
- Author
-
Lei Liu, Sun Yinghong, and Hao Chen
- Subjects
General Computer Science ,Computer science ,business.industry ,Sentiment analysis ,Information processing ,Lexicon ,computer.software_genre ,Hotspot (Wi-Fi) ,Social media ,Artificial intelligence ,business ,computer ,Mechanism (sociology) ,Natural language processing - Abstract
Sentiment analysis of social media texts has become a research hotspot in information processing. Sentiment analysis methods based on the combination of machine learning and sentiment lexicon need to select features. Selected emotional features are often subjective, which can easily lead to overfitted models and poor generalization ability. Sentiment analysis models based on deep learning can automatically extract effective text emotional features, which will greatly improve the accuracy of text sentiment analysis. However, due to the lack of a multi-classification emotional corpus, it cannot accurately express the emotional polarity. Therefore, we propose a multi-classification sentiment analysis model, GLU-RCNN, based on Gated Linear Units and attention mechanism. Our model uses the Gated Linear Units based attention mechanism to integrate the local features extracted by CNN with the semantic features extracted by the LSTM. The local features of short text are extracted and concatenated by using multi-size convolution kernels. At the classification layer, the emotional features extracted by CNN and LSTM are respectively concatenated to express the emotional features of the text. The detailed evaluation on two benchmark datasets shows that the proposed model outperforms state-of-the-art approaches.
- Published
- 2021
34. Handwritten English word recognition using a deep learning based object detection architecture
- Author
-
Ram Sarkar, Samir Malakar, Elisa H. Barney Smith, and Riktim Mondal
- Subjects
Vocabulary ,Computer Networks and Communications ,Computer science ,business.industry ,Deep learning ,media_common.quotation_subject ,Word error rate ,computer.software_genre ,Lexicon ,ComputingMethodologies_PATTERNRECOGNITION ,Hardware and Architecture ,Handwriting ,Handwriting recognition ,Word recognition ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Media Technology ,Artificial intelligence ,business ,computer ,Software ,Natural language processing ,Word (computer architecture) ,media_common - Abstract
Handwriting is used to distribute information among people. To access this information for further analysis the page needs to be optically scanned and converted to machine recognizable form. Due to unconstrained writing styles along with connected and overlapping characters, handwriting recognition remains a challenging task. Most of the methods in the literature use lexicon-based approaches and train their models on large datasets having near 50 K word samples to achieve good results. This results in high computational requirements. While these models use around 50 K words in their dictionary when recognizing handwritten English text, the actual number of words in the dictionary is much higher than this. To this end, we propose a handwriting recognition technique to recognize handwritten English text based on a YOLOv3 object recognition model that is lexicon-free and that performs sequential character detection and identification with a low number of training samples (only 1200 word images). This model works well without any dependency on writers’ style of writing. This is tested on the IAM dataset and it is able to achieve 29.21% Word Error Rate and 9.53% Character Error Rate without a predefined vocabulary, which is on par with the state-of-the-art lexicon-based word recognition models.
- Published
- 2021
35. Enhanced lexicon E-SLIDE framework for efficient sentiment analysis
- Author
-
Geetika Vashisht and Manisha Jaillia
- Subjects
Interpretation (logic) ,Computer Networks and Communications ,business.industry ,Computer science ,Applied Mathematics ,Sentiment analysis ,Decision tree ,Lexicon ,computer.software_genre ,Computer Science Applications ,Task (project management) ,Bayes' theorem ,Computational Theory and Mathematics ,Artificial Intelligence ,Artificial intelligence ,Electrical and Electronic Engineering ,business ,Precision and recall ,computer ,Natural language processing ,Information Systems ,Meaning (linguistics) - Abstract
Idioms are multi-word non-compositional expressions whose meaning is different from the underlying meaning and thereby posing a significant challenge in the interpretation. Found in all the languages, idioms beautify a language but complicate the Sentiment Analysis task. The Extended Sentiment Lexicon of IDiomatic Expressions (ESLIDE)1, created for this work, is an extension of the state-of-the-art lexicon of idioms-SLIDE which comprised of five thousand frequently occurring idioms estimated from a large corpus of English language. The contribution of idioms as features in sentiment analysis task is examined in this paper. The classifiers—Naive Bayes, Multinomial Naive Bayes and Decision Trees used in this work are evaluated using accuracy, precision and recall, noticing a slight increase in the performance over all three sentiment classes viz. positive, negative and neutral improving the baseline results by six percent points.
- Published
- 2021
36. Studying the effect of characteristic vector alteration on Arabic sentiment classification
- Author
-
Ibtissam Touahri and Azzeddine Mazroui
- Subjects
General Computer Science ,Computer science ,Arabic ,Opinion extraction ,media_common.quotation_subject ,Context (language use) ,02 engineering and technology ,computer.software_genre ,Lexicon ,Set (abstract data type) ,Scarcity ,Sentiment analysis ,Negation ,0202 electrical engineering, electronic engineering, information engineering ,Supervised approach ,Eigenvalues and eigenvectors ,media_common ,business.industry ,Arabic language ,020206 networking & telecommunications ,QA75.5-76.95 ,Semantic segmentation ,language.human_language ,Electronic computers. Computer science ,language ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Word (computer architecture) ,Natural language processing - Abstract
In this paper we propose a new approach to analyze sentiments for the Arabic language. To overcome the scarcity and size limitation of the required Arabic language resources for training and analysis tasks, we built new lexical resources using different approaches. We have also integrated the morphological notion by creating both stemmed and lemmatized versions of word lexicons. Thereafter, the generated resources were used in the construction of a supervised model from a set of features considering the word negation context. Similarly, we have semantically segmented the lexicon in order to reduce the model vectors size and consequently improve the execution time.
- Published
- 2021
37. End-to-end aspect-based sentiment analysis with hierarchical multi-task learning
- Author
-
Li Jin, Xian Sun, Guangluan Xu, Xinyi Wang, and Zequn Zhang
- Subjects
0209 industrial biotechnology ,business.industry ,Computer science ,Cognitive Neuroscience ,Sentiment analysis ,Multi-task learning ,02 engineering and technology ,Lexicon ,Machine learning ,computer.software_genre ,Sequence labeling ,Computer Science Applications ,Task (project management) ,020901 industrial engineering & automation ,Discriminative model ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Feature (machine learning) ,020201 artificial intelligence & image processing ,Artificial intelligence ,Representation (mathematics) ,business ,computer - Abstract
End-to-end aspect-based sentiment analysis (E2E-ABSA) is a sequence labeling task which detects aspect terms and the corresponding sentiment simultaneously. Previous works ignore the useful task-specific knowledge and embed the vital aspect and sentiment attributes implicitly in the intermediate layers. In this paper, we propose a hierarchical multi-task learning framework, which explicitly leverages task-related knowledge via the supervision of intermediate layers. Specifically, aspect term extraction, sentiment lexicon detection, and aspect sentiment detection are designed to encode the aspect boundary and sentiment information. The tasks are in charge of different perspectives and levels of knowledge, which provide multi-fold regulation effects to optimize the main task. Unlike vanilla multi-task learning, all the tasks are integrated into a hierarchical structure to help the higher-level tasks make full use of the lower-level tasks’ information. Experimental results on three datasets demonstrate that the proposed method achieves state-of-the-art results. Further analysis shows that the proposed method achieves better performance than single-task and vanilla multi-task learning methods and yields a more discriminative feature representation.
- Published
- 2021
38. Interpretation of Agricultural Pest Problem Handling System in Usada Wisada Pari Scripture
- Author
-
Putu Sabda Jayendra, Kadek Ayu Ekasani, Ida Bagus Subrahmaniam Saitya, and Made Wahyu Mahendra
- Subjects
BL51-65 ,Hinduism ,business.industry ,Philosophy. Psychology. Religion ,Interpretation (philosophy) ,Environmental ethics ,Lexicon ,Object (philosophy) ,Countermeasure ,Agriculture ,ethnolinguistics ,lexicon ,Natural (music) ,Philosophy of religion. Psychology of religion. Religion in relation to other subjects ,Sociology ,Ethnolinguistics ,General Agricultural and Biological Sciences ,business ,agriculture - Abstract
The knowledge of cultivation and the methods of treating and solving pest problems naturally without neglecting the local culture has been an inseparable aspect of Balinese agricultural life, which is known for its irrigation system called subak. This study aims to examine agricultural scripture named Usada Wisada Pari from two perspectives. First, this study examines the types of pests and its countermeasure. Secondly, it is important to study the lexicon form of these pests. This study shows that the types of rice pests in the Usada Wisada Pari text are categorized into two types, namely animals and plants. The countermeasure consists of natural ritual elements from plants and incantations. Furthermore, this research also shows that all kinds of plague and agricultural pests, along with ways to overcome them, reflect the very strong Shivaistic teachings. All kinds of diseases, countermeasures and prevention are described as the authority of Lord Shiva as the god of destruction in the Hindu concept. It can be concluded that the scripture of Usada Wisada Pari is a text that provides knowledge about rice pest antidotes in an environmentally friendly and holistic manner because it involves natural and religious elements. This study is expected to contribute both to academics or future researchers as well as to the public. It is hoped that academics and researchers can use this present study as a source and expand as well as deepen the object of study based on ethnoagriculture. Meanwhile, the general public can increase their knowledge regarding alternative management of agricultural epidemics in synergy with nature and local wisdom.
- Published
- 2021
39. Social News Use & Citizen Participation among Young Activists in Singapore
- Author
-
Winston Jin Song Teo
- Subjects
Sociology and Political Science ,business.industry ,media_common.quotation_subject ,Context (language use) ,Public relations ,Lexicon ,Democracy ,Computer Science Applications ,Politics ,Political science ,Mainstream ,Social media ,business ,Social news ,media_common - Abstract
This article presents a study of how civically engaged young adults engage with news on social media, within the context of a developing democracy – Singapore. Based on in-depth interviews with 20 young activists, it discusses how they approach social media as a source of news, what motivates them to engage in more than one social news platform, and how social news use fits into their political lexicon. The results reveal that despite their affinity towards news-related content on social media, they are neither partial towards mainstream, nor alternative news providers on this medium. Their primary social news platform is perceived to offer the best means to disseminate news-related information. However, they are also concerned about their privacy and practice certain strategies to mitigate this. Despite its drawbacks, the activists accept social news use as a viable means of political socialisation and mobilisation.
- Published
- 2021
40. Market Reaction to iPhone Rumors
- Author
-
Zhang Wu, Yuchen Liu, and Terence Tai-Leung Chong
- Subjects
050208 finance ,business.industry ,05 social sciences ,Event study ,Market reaction ,Lexicon ,Stock price ,Computer Science Applications ,Highly sensitive ,Computational Mathematics ,0502 economics and business ,New product development ,Econometrics ,Economics ,Insider trading ,Computer Vision and Pattern Recognition ,050207 economics ,Construct (philosophy) ,business ,Finance - Abstract
The paper studies the effects of new product rumors about the iPhone on the stock price of the Apple company. We scrape iPhone rumors from Macrumors.com, and obtain a dataset covering 1,264 articles containing 180 words on average between January 2002 and December 2015. Moreover, we construct a market-decided lexicon to transform qualitative information into quantitative data, and analyze what type of words and what information embedded in the rumors are apt to impact on Apple’s stock price. Unlike previous studies, we do not rely on the widely-adopted Harvard-IV-4 dictionary, as the coefficients of the words from the dictionary are neither significant nor consistent with their polarities, compared with our results. The paper obtains three main findings. First, the spread of rumors has a significant impact on the stock price. Second, positive words, rather than negative words, play an important role in affecting the stock price. Third, the stock price is highly sensitive to the words related to the appearance of the iPhone.
- Published
- 2021
41. A comprehensive study of domain-specific emoji meanings in sentiment classification
- Author
-
Nader Mahmoudi, Paul Docherty, and Łukasz P. Olech
- Subjects
Syntax (programming languages) ,Emoji ,Computer science ,business.industry ,Context (language use) ,computer.software_genre ,Semantics ,Lexicon ,Domain specificity ,Management Information Systems ,Classifier (linguistics) ,Artificial intelligence ,business ,computer ,Natural language processing ,Information Systems ,Meaning (linguistics) - Abstract
The inclusion of emojis when solving natural language processing problems (e.g., text‐based emotion detection, sentiment classification, topic analysis) improves the quality of the results. However, the existing literature focuses only on the general meaning transferred by emojis and has not examined emojis in the context of investor sentiment classification. This article provides a comprehensive study of the impact that inclusion of emojis could make in predicting stock investors’ sentiment. We found that a classifier that incorporates domain-specific emoji vectors, which capture the syntax and semantics of emojis in the financial context, could improve the accuracy of investor sentiment classification. Also, when domain-specific emoji vectors are considered, daily time-series of investor sentiment demonstrated additional marginal explanatory power on returns and volatility. Further, a comparison of conducted cluster analysis of domain-specific versus domain-independent emoji vectors showed different natural groupings of emojis reflecting domain specificity when special meaning of emojis is considered. Finally, domain-specific emoji vectors could result in the development of significantly superior emoji sentiment lexicons. Given the importance of domain-specific emojis in investor sentiment classification of social media data, we have developed an emoji lexicon that could be used by other researchers.
- Published
- 2021
42. Extracting Sentiments by Using Fine-Grained Mining
- Author
-
Rathinavelu Arumugam and Gobi Natesan
- Subjects
Information retrieval ,Relation (database) ,business.industry ,Computer science ,Sentiment analysis ,Rank (computer programming) ,Lexicon ,Computer Science Applications ,Set (abstract data type) ,Feature (machine learning) ,Graph (abstract data type) ,The Internet ,Electrical and Electronic Engineering ,business - Abstract
With the rapid development of web, a huge number of reviews about various kinds of products are springing on the Internet. Many users purchase the products through online thereby reduce the time consumption by avoid travelling. An opinion lexicon is a file of opinion words which is used to indicate the list of positive, negative or neutral sentiments in the reviews. However it is difficult to extract features from a large corpus, so identifying the relations between opinion words and opinion targets of a specific domain allows thorough understanding of customers opinion. In existing system, identifying opinion relation between opinion words and opinion targets are performed by using word alignment model. Identifying relation among words is done by graph-based co-ranking algorithm which is used to estimate the confidence of each candidate. To filter false candidates we propose a novel framework for fine-grained opinion mining where involves propagation and refinement process. This is used to filter false candidates for effective opinion to the users by using three-layer opinion relation graph to identify potential relations and to rank all the feature candidates which effectively alleviates the problems of error propagation and infrequent results discovering. For attaining a stable syntactic pattern set, propagation and refinement process is done and the final review text is considered to be an opinion of specific product. The performance of proposed system outperforms the framework of fine-grained opinion mining by reducing the error propagation and false results are removed from the online review.
- Published
- 2021
43. A hybrid model for spelling error detection and correction for Urdu language
- Author
-
Muhammad Hasan Jamal, Usama Ijaz Bajwa, Muhammad Waqas Anwar, and Romila Aziz
- Subjects
business.industry ,Computer science ,Lexicon ,computer.software_genre ,Spelling ,Ranking (information retrieval) ,Soundex ,Artificial Intelligence ,Edit distance ,Artificial intelligence ,business ,F1 score ,Error detection and correction ,computer ,Software ,Natural language processing ,Word (computer architecture) - Abstract
Detecting and correcting misspelled words in a written text are of great importance in many natural language processing applications. Errors can be broadly classified into two groups, namely spelling error and contextual errors. Spelling errors occur when the misspelled words do not exist in a dictionary and are meaningless, while contextual errors occur when the words do exist in the dictionary, but their use is not as intended by the writer. This paper presents an “Urdu Spell Checker” that detects incorrect spellings of a word using widely used lexicon lookup approach and provides a list of candidate words containing correct spellings by applying the edit distance technique which covers all types of spelling errors. To identify the best candidate word, this paper proposes a hybrid model that ranks the words in the candidate word list. Multiple ranking techniques such as Soundex, Shapex, LCS and N-gram are used standalone, as well in combination, to determine the best technique in terms of F1 score. A dictionary containing 48,551 words is developed from UMC corpus and Urdu newspaper corpus. Our hybrid model achieves an F1 score of 94.02% when considering top five suggested words and an F1 score of 88.29% when considering top one suggested word.
- Published
- 2021
44. Early-stage detection of eye diseases on microblogs: glaucoma recognition
- Author
-
Hosam Al-Samarraie and Samer Muthana Sarsam
- Subjects
genetic structures ,Computer Networks and Communications ,Computer science ,Microblogging ,media_common.quotation_subject ,Glaucoma ,computer.software_genre ,Lexicon ,Artificial Intelligence ,Classifier (linguistics) ,medicine ,Social media ,Electrical and Electronic Engineering ,media_common ,Multinomial logistic regression ,business.industry ,Applied Mathematics ,medicine.disease ,eye diseases ,Computer Science Applications ,Hierarchical clustering ,Sadness ,Computational Theory and Mathematics ,sense organs ,Artificial intelligence ,business ,computer ,Natural language processing ,Information Systems - Abstract
Glaucoma is the most popular optic neuropathy that causes blindness in people without warning signs. The early detection of glaucoma is crucial for an early treatment that could be useful to delay vision loss. However, since vision loss caused by glaucoma cannot be recovered, this study proposes an early detection mechanism for glaucoma using social media posts. Glaucoma-related tweets were collected using the Twitter streaming application programming interface (API). A hierarchical clustering algorithm was applied to group the tweets that share similar features together. In each cluster, the co-occurrence analysis was applied using the VOSViewer technique to map specific disease-related terminologies. Users’ emotions (e.g., anger, fear, sadness, and joy) and their polarity (positive, neutral, and negative) were extracted using NRC (Affect Intensity Lexicon) and SentiStrength techniques. The detection of glaucoma was achieved by using multinomial logistic regression (Logistic). The classification results showed that the Logistic classifier was able to predict glaucoma tweets with 98.73% accuracy. Our findings revealed that negative, fear, and sadness sentiments can be useful in detecting glaucoma. This study provides an effective mechanism to detect glaucoma disease from Twitter messages.
- Published
- 2021
45. Converting raw transcripts into an annotated and turn-aligned TEI-XML corpus: the example of the Corpus of Serbian Forms of Address
- Author
-
Lemmenmeier-Batinić, Dolores, University of Zurich, and Lemmenmeier-Batinić, Dolores
- Subjects
Linguistics and Language ,Computer science ,computer.internet_protocol ,media_common.quotation_subject ,P1-1091 ,410 Linguistics ,10245 Institute of Slavonic Studies ,Lexicon ,computer.software_genre ,Language and Linguistics ,Transcription (linguistics) ,language biographical interviews ,Philology. Linguistics ,1203 Language and Linguistics ,media_common ,spoken Serbian ,Interactional linguistics ,Grammar ,business.industry ,forms of address ,Phonetics ,Syntax ,language.human_language ,3310 Linguistics and Language ,490 Other languages ,language ,Artificial intelligence ,data re-usability ,Serbian ,business ,computer ,Natural language processing ,XML - Abstract
This paper describes the procedure of building a TEI-XML corpus of spoken Serbian starting from raw transcripts. The corpus consists of semi–structured interviews, which were gathered with the aim of investigating forms of address in Serbian. The interviews were thoroughly transcribed according to GAT transcribing conventions. However, the transcription was carried out without tools that would control the validity of the GAT syntax, or align the transcript with the audio records. In order to offer this resource to a broader audience, we resolved the inconsistencies in the original transcripts, normalised the semi-orthographic transcriptions and converted the corpus into a TEI-format for transcriptions of speech. Further, we enriched the corpus by tagging and lemmatising the data. Lastly, we aligned the corpus turns to the corresponding audio segments by using a force-alignment tool. In addition to presenting the main steps involved in converting the corpus to the XML-format, this paper also discusses current challenges in the processing of spoken data, and the implications of data re-use regarding transcriptions of speech. This corpus can be used for studying Serbian from the perspective of interactional linguistics, for investigating morphosyntax, grammar, lexicon and phonetics of spoken Serbian, for studying disfluencies, as well as for testing models for automatic speech recognition and forced alignment. The corpus is freely available for research purposes.
- Published
- 2021
46. Introduction to the special issue: Digital diplomacy in Africa
- Author
-
Bob Wekesa, Yarik Turianskyi, and Odilile Ayodele
- Subjects
business.industry ,Political science ,media_common.quotation_subject ,Political Science and International Relations ,Digital diplomacy ,Media studies ,The Internet ,Social media ,Lexicon ,business ,Diplomacy ,media_common ,Term (time) - Abstract
Due to increased global access to the Internet and the advent of social media, diplomacy has been irreversibly changed. The term ‘digital diplomacy’, which entered the lexicon in the past two decad...
- Published
- 2021
47. Autoencoder for Semisupervised Multiple Emotion Detection of Conversation Transcripts
- Author
-
Hiroyuki Shindo, Yuji Matsumoto, and Duc-Anh Phan
- Subjects
Context model ,business.industry ,Computer science ,media_common.quotation_subject ,Context (language use) ,Lexicon ,computer.software_genre ,Autoencoder ,Human-Computer Interaction ,Word2vec ,Conversation ,Artificial intelligence ,Computational linguistics ,Affective computing ,business ,computer ,Software ,Natural language processing ,media_common - Abstract
Textual emotion detection is a challenge in computational linguistics and affective computing study as it involves the discovery of all associated emotions expressed within a given piece of text. It becomes an even more difficult problem when applied to conversation transcripts, as we need to model the spoken utterances between speakers, keeping in mind the context of the entire conversation. In this paper, we propose a semisupervised multilabel method of predicting emotions from conversation transcripts. The corpus contains conversational quotes extracted from movies. A small number of them are annotated, while the rest are used for unsupervised training. We use the word2vec word-embedding method to build an emotion lexicon from the corpus and to embed the utterances into vector representations. A deep-learning autoencoder is then used to discover the underlying structure of the unsupervised data. We fine-tune the learned model on labeled training data, and measure its performance on a test set. The experiment result suggests that the method is effective and is only slightly behind human annotators.
- Published
- 2021
48. Neural Attentive Network for Cross-Domain Aspect-Level Sentiment Classification
- Author
-
Xiaojun Chen, Ying Shen, Qiang Qu, Wenting Tu, Min Yang, and Wenpeng Yin
- Subjects
Artificial neural network ,business.industry ,Computer science ,02 engineering and technology ,021001 nanoscience & nanotechnology ,computer.software_genre ,Lexicon ,Semantics ,Latent Dirichlet allocation ,Domain (software engineering) ,Human-Computer Interaction ,symbols.namesake ,Classifier (linguistics) ,0202 electrical engineering, electronic engineering, information engineering ,symbols ,020201 artificial intelligence & image processing ,Artificial intelligence ,0210 nano-technology ,F1 score ,business ,computer ,Feature learning ,Software ,Natural language processing - Abstract
This work takes the lead to study the aspect-level sentiment classification in the domain adaptation scenario . Given a document of any domains, the model needs to figure out the sentiments with respect to fine-grained aspects in the documents. Two main challenges exist in this problem. One is to build a robust document modeling across domains; the other is to mine the domain-specific aspects and make use of the sentiment lexicon. In this paper, we propose a novel approach Neural Attentive model for cross-domain Aspect-level sentiment CLassification (NAACL), which leverages the benefits of the supervised deep neural network as well as the unsupervised probabilistic generative model to strengthen the representation learning. NAACL jointly learns two tasks: (i) a domain classifier, working on documents in both the source and target domains to recognize the domain information of input texts and transfer knowledge from the source domain to the target domain. In particular, a weakly supervised Latent Dirichlet Allocation model (wsLDA) is proposed to learn the domain-specific aspect and sentiment lexicon representations that are then used to calculate the aspect/lexicon-aware document representations via a multi-view attention mechanism; (ii) an aspect-level sentiment classifier, sharing the document modeling with the domain classifier. It makes use of the domain classification results and the aspect/sentiment-aware document representations to classify the aspect-level sentiment of the document in domain adaptation scenario. NAACL is evaluated on both English and Chinese datasets with the out-of-domain as well as in-domain setups. Quantitatively, the experiments demonstrate that NAACL has robust superiority over the compared methods in terms of classification accuracy and F1 score. The qualitative evaluation also shows that the proposed model is capable of reasonably paying attention to those words that are important to judge the sentiment polarity of the input text given an aspect.
- Published
- 2021
49. Lexicon Knowledge Boosted Interaction Graph Network for Adverse Drug Reaction Recognition From Social Media
- Author
-
Zhihao Yang, Hongfei Lin, Zhiheng Li, Yin Zhang, Jian Wang, and Lei Wang
- Subjects
Drug-Related Side Effects and Adverse Reactions ,Computer science ,Knowledge engineering ,computer.software_genre ,Lexicon ,Pharmacovigilance ,Health Information Management ,medicine ,Humans ,Social media ,Electrical and Electronic Engineering ,business.industry ,medicine.disease ,Noun phrase ,Computer Science Applications ,Task analysis ,Neural Networks, Computer ,Artificial intelligence ,business ,Social Media ,computer ,Sentence ,Natural language processing ,Adverse drug reaction ,Biotechnology - Abstract
The World Health Organization underlines the significance of adverse drug reaction (ADR) reports for patients' safety. Actually, many potential ADRs tend to be under-reported in post-market ADR surveillance. Recognizing ADRs from social media is indispensably important and could complement post-market ADR surveillance for more effective pharmacovigilance studies. However, previous approaches pose two challenges: 1) ADRs show high expression variability in social media, and thus, many potential ADRs are out-of-lexicon ones, which are difficult to be recognized, and 2) most phrasal ADRs are non-standard mentions and their boundaries are difficult to identify accurately. To tackle these challenges, we design three interaction graphs and propose a neural network approach, i.e., Interaction Graph Network (IGN). Specifically, to recognize more out-of-lexicon ADRs, besides the mentions in ADR lexicon, noun phrases in the input sentence are regarded as candidate phrases and their features are taken into considerations. Moreover, in an attempt to accurately identify ADR boundaries, three word-phrase interaction graphs are designed to represent lexicon knowledge and are encoded using graph attention networks (GATs) to directly integrate various boundary and contextual information of candidate phrases into ADR recognition. Experimental results on two benchmark datasets show that IGN can recognize ADR accurately and consistently outperforms other state-of-the-art approaches.
- Published
- 2021
50. Toward Integrated CNN-based Sentiment Analysis of Tweets for Scarce-resource Language—Hindi
- Author
-
Agam Madan, Qin Xin, Ankit Chaudhary, Shubham Shubham, Vedika Gupta, and Nikita Jain
- Subjects
Hindi ,General Computer Science ,Computer science ,business.industry ,Sentiment analysis ,Decision tree ,020206 networking & telecommunications ,02 engineering and technology ,computer.software_genre ,Lexicon ,Convolutional neural network ,Mandarin Chinese ,language.human_language ,Support vector machine ,Resource (project management) ,0202 electrical engineering, electronic engineering, information engineering ,language ,020201 artificial intelligence & image processing ,Artificial intelligence ,business ,computer ,Natural language processing - Abstract
Linguistic resources for commonly used languages such as English and Mandarin Chinese are available in abundance, hence the existing research in these languages. However, there are languages for which linguistic resources are scarcely available. One of these languages is the Hindi language. Hindi, being the fourth-most popular language, still lacks in richly populated linguistic resources, owing to the challenges involved in dealing with the Hindi language. This article first explores the machine learning-based approaches—Naïve Bayes, Support Vector Machine, Decision Tree, and Logistic Regression—to analyze the sentiment contained in Hindi language text derived from Twitter. Further, the article presents lexicon-based approaches (Hindi Senti-WordNet, NRC Emotion Lexicon) for sentiment analysis in Hindi while also proposing a Domain-specific Sentiment Dictionary. Finally, an integrated convolutional neural network (CNN)—Recurrent Neural Network and Long Short-term Memory—is proposed to analyze sentiment from Hindi language tweets, a total of 23,767 tweets classified into positive, negative, and neutral. The proposed CNN approach gives an accuracy of 85%.
- Published
- 2021
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.