Descriptor: "Pointwise mutual information" / Database: Complementary Index - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Pointwise mutual information"' showing total 28 results

Start Over Descriptor "Pointwise mutual information" Database Complementary Index

28 results on '"Pointwise mutual information"'

1. Understanding the effects of negative (and positive) pointwise mutual information on word vectors.

Author: Salle, Alexandre and Villavicencio, Aline
Subjects: GEOMETRIC modeling, SEMANTICS, VOCABULARY, FACTORIZATION
Abstract: Despite the recent popularity of contextual word embeddings, static word embeddings still dominate lexical semantic tasks, making their study of continued relevance. A widely adopted family of such static word embeddings is derived by explicitly factorising the Pointwise Mutual Information (PMI) weighting of the co-occurrence matrix. As unobserved co-occurrences lead PMI to negative infinity, a common workaround is to clip negative PMI at 0. However, it is unclear what information is lost by collapsing negative PMI values to 0. To answer this question, we isolate and study the effects of negative (and positive) PMI on the semantics and geometry of models adopting factorisation of different PMI matrices. Word and sentence-level evaluations show that only accounting for positive PMI in the factorisation strongly captures both semantics and syntax, whereas using only negative PMI captures little of semantics but a surprising amount of syntactic information. Results also reveal that incorporating negative PMI induces stronger rank invariance of vector norms and directions, as well as improved rare word representations. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

2. Chinese text classification by combining Chinese-BERTology-wwm and GCN.

Author: Xue Xu, Yu Chang, Jianye An, and Yongqiang Du
Subjects: LANGUAGE models, NATURAL language processing, CHINESE language, COSINE function
Abstract: Text classification is an important and classic application in natural language processing (NLP). Recent studies have shown that graph neural networks (GNNs) are effective in tasks with rich structural relationships and serve as effective transductive learning approaches. Text representation learning methods based on large-scale pretraining can learn implicit but rich semantic information from text. However, few studies have comprehensively utilized the contextual semantic and structural information for Chinese text classification. Moreover, the existing GNN methods for text classification did not consider the applicability of their graph construction methods to long or short texts. In this work, we propose Chinese-BERTology-wwm-GCN, a framework that combines Chinese bidirectional encoder representations from transformers (BERT) series models with whole word masking (Chinese-BERTology-wwm) and the graph convolutional network (GCN) for Chinese text classification. When building text graph, we use documents and words as nodes to construct a heterogeneous graph for the entire corpus. Specifically, we use the term frequency-inverse document frequency (TF-IDF) to construct the worddocument edge weights. For long text corpora, we propose an improved pointwise mutual information (PMI*) measure for words according to their word cooccurrence distances to represent the weights of word-word edges. For short text corpora, the co-occurrence information between words is often limited. Therefore, we utilize cosine similarity to represent the word-word edge weights. During the training stage, we effectively combine the cross-entropy and hinge losses and use them to jointly train Chinese-BERTology-wwm and GCN. Experiments show that our proposed framework significantly outperforms the baselines on three Chinese benchmark datasets and achieves good performance even with few labeled training sets. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

3. A weighted-link graph neural network for lung cancer knowledge classification.

Author: Cheng, Ching-Hsue and Ji, Zheng-Ting
Subjects: CLASSIFICATION, LUNG cancer, KNOWLEDGE graphs, TUMOR classification, KNOWLEDGE representation (Information theory), NATURAL language processing, GRAPH algorithms
Abstract: Visualized knowledge representation can more effectively help the public gain knowledge about lung cancer prevention, diagnosis, treatment, and subsequent life. Therefore, this study collected articles on lung cancer from the well-known Web of Science database to analyze lung cancer literature, and the text data were published between 2016 and 2021. First, we used natural language processing to handle the collected text data, and then we used the latent Dirichlet allocation method to perform topic modeling and obtain the optimal topic numbers based on two coherence metrics for assigning the class of every article. Next, a PMI_2 weighted was proposed to build an initial weighted knowledge graph, and four graph neural network algorithms were used to train the initial weighted knowledge graph. In addition, we proposed a PMI_2 + link to improve the classification performance, and the additional links were obtained from the graph auto-encoder and graph convolutional network training. When the best classification performance has been obtained, these edge weights have a representative. For visualized knowledge representation, we used the Neo4j tool to display the nodes and edge weights for the final literature knowledge. The results show that the use of the proposed PMI_2 + link to build a weighted graph has a better classification performance. Further, the proposed PMI_2 + link can effectively reduce the number of edges on the knowledge graphs and avoid insufficient GPU memory. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

4. Using Pointwise Mutual Information for Breast Cancer Health Disparities Research With SEER-Medicare Claims.

Author: Egleston, Brian L., Chanda, Ashis Kumar, Tian Bai, Fang, Carolyn Y., Bleicher, Richard J., and Vucetic, Slobodan
Subjects: BREAST cancer treatment, HEALTH equity, MACHINE learning, MEDICARE, DATA analysis
Abstract: Identification of procedures using International Classification of Diseases or Healthcare Common Procedure Coding System codes is challenging when conducting medical claims research. We demonstrate how Pointwise Mutual Information can be used to find associated codes. We apply the method to an investigation of racial differences in breast cancer outcomes. We used Surveillance Epidemiology and End Results (SEER) data linked to Medicare claims. We identified treatment using two methods. First, we used previously published definitions. Second, we augmented definitions using codes empirically identified by the Pointwise Mutual Information statistic. Similar to previous findings, we found that presentation differences between Black and White women closed much of the estimated survival curve gap. However, we found that survival disparities were completely eliminated with the augmented treatment definitions. We were able to control for a wider range of treatment patterns that might affect survival differences between Black and White women with breast cancer. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

5. An information‐theoretic approach to the analysis of location and colocation patterns.

Author: van Dam, Alje, Gomez‐Lievano, Andres, Neffke, Frank, and Frenken, Koen
Subjects: LOCATION analysis, ECONOMIC geography, STATISTICAL hypothesis testing, INFERENTIAL statistics, STATISTICAL significance
Abstract: The study of location and colocation of economic activities lies at the heart of economic geography and related disciplines, but the indices used to quantify these patterns are often defined ad hoc and lack a clear statistical foundation. We propose a statistical framework to quantify location and colocation associations of economic activities using information‐theoretic measures. We relate the resulting measures to existing measures of revealed comparative advantage, localization, specialization, and coagglomeration and show how different measures derive from the same general framework. To support the use of these measures in hypothesis testing and statistical inference, we develop a Bayesian estimation approach to provide measures of uncertainty and statistical significance of the estimated quantities. We illustrate this framework in an application to an analysis of location and colocation patterns of occupations in US cities. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

6. USING A SEMANTIC FUZZY SYSTEM TO INTELLIGENT DOCUMENTS SUMMARIZATION.

Author: Amin, Ahmed E.
Subjects: INFORMATION technology, FUZZY systems, SINGULAR value decomposition, DEEP learning, FORECASTING
Abstract: Due to the information technology revolution, there are many and varied methods of document summarization to obtain specific information from documents. Automated summarization methods rely on identifying important points in all relevant documents to produce a concise summary. Therefore, this paper presents an intelligent classification-based automated summarization system using a semantic neuro-fuzzy approach. The proposed system consists of five integrated phases, which are the Document Pre -processing, the intermediate representation, the Index Matrices Weight Calculation, the Neuro fuzzy system, and the Summary Generation, respectively. The first stage divides paragraphs into sentences and sentences into words, by removing the most frequent words that do not carry any information and stripping the word from suffixes and prefixes to extract the « root » of the words. In the second stage, the Latent Semantic Index was used to produce the words/concepts matrix and concepts/sentences matrix. The third stage used the pointwise mutual information measure that defines particularly informative about the target word, as well as the best weighting of association between words. The knowledge is then extracted using a neuro-fuzzy network learning technique in phase four, which encodes the learned knowledge in its structure as a set of fuzzy rules. In order to build a number of fuzzy models with an increasing number of input variables chosen by the user according to their rankings, a quick clustering technique is then implemented. Then, according to a user-defined confidence level, the summary is generated from the knowledge base by a better understanding of the fuzzy rules. Recall-Oriented Understudy for Gisting Evaluation (ROUGE), which showed improved results in comparison to previous strategies in terms of average accuracy, recall, and F-measure in the document understanding conference (DUC) dataset, was used to assess the performance of the suggested model. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

7. Network regression analysis in transcriptome-wide association studies.

Author: Jin, Xiuyuan, Zhang, Liye, Ji, Jiadong, Ju, Tao, Zhao, Jinghua, and Yuan, Zhongshang
Subjects: DIASTOLIC blood pressure, REGRESSION analysis, SYSTOLIC blood pressure, FALSE positive error, GENOME-wide association studies
Abstract: Background: Transcriptome-wide association studies (TWASs) have shown great promise in interpreting the findings from genome-wide association studies (GWASs) and exploring the disease mechanisms, by integrating GWAS and eQTL mapping studies. Almost all TWAS methods only focus on one gene at a time, with exception of only two published multiple-gene methods nevertheless failing to account for the inter-dependence as well as the network structure among multiple genes, which may lead to power loss in TWAS analysis as complex disease often owe to multiple genes that interact with each other as a biological network. We therefore developed a Network Regression method in a two-stage TWAS framework (NeRiT) to detect whether a given network is associated with the traits of interest. NeRiT adopts the flexible Bayesian Dirichlet process regression to obtain the gene expression prediction weights in the first stage, uses pointwise mutual information to represent the general between-node correlation in the second stage and can effectively take the network structure among different gene nodes into account. Results: Comprehensive and realistic simulations indicated NeRiT had calibrated type I error control for testing both the node effect and edge effect, and yields higher power than the existed methods, especially in testing the edge effect. The results were consistent regardless of the GWAS sample size, the gene expression prediction model in the first step of TWAS, the network structure as well as the correlation pattern among different gene nodes. Real data applications through analyzing systolic blood pressure and diastolic blood pressure from UK Biobank showed that NeRiT can simultaneously identify the trait-related nodes as well as the trait-related edges. Conclusions: NeRiT is a powerful and efficient network regression method in TWAS. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

8. Network regression analysis in transcriptome-wide association studies.

Author: Jin, Xiuyuan, Zhang, Liye, Ji, Jiadong, Ju, Tao, Zhao, Jinghua, and Yuan, Zhongshang
Subjects: DIASTOLIC blood pressure, REGRESSION analysis, SYSTOLIC blood pressure, FALSE positive error, GENOME-wide association studies
Abstract: Background: Transcriptome-wide association studies (TWASs) have shown great promise in interpreting the findings from genome-wide association studies (GWASs) and exploring the disease mechanisms, by integrating GWAS and eQTL mapping studies. Almost all TWAS methods only focus on one gene at a time, with exception of only two published multiple-gene methods nevertheless failing to account for the inter-dependence as well as the network structure among multiple genes, which may lead to power loss in TWAS analysis as complex disease often owe to multiple genes that interact with each other as a biological network. We therefore developed a Network Regression method in a two-stage TWAS framework (NeRiT) to detect whether a given network is associated with the traits of interest. NeRiT adopts the flexible Bayesian Dirichlet process regression to obtain the gene expression prediction weights in the first stage, uses pointwise mutual information to represent the general between-node correlation in the second stage and can effectively take the network structure among different gene nodes into account. Results: Comprehensive and realistic simulations indicated NeRiT had calibrated type I error control for testing both the node effect and edge effect, and yields higher power than the existed methods, especially in testing the edge effect. The results were consistent regardless of the GWAS sample size, the gene expression prediction model in the first step of TWAS, the network structure as well as the correlation pattern among different gene nodes. Real data applications through analyzing systolic blood pressure and diastolic blood pressure from UK Biobank showed that NeRiT can simultaneously identify the trait-related nodes as well as the trait-related edges. Conclusions: NeRiT is a powerful and efficient network regression method in TWAS. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

9. Automatic medical term extraction from Vietnamese clinical texts.

Author: Vo, Chau, Cao, Tru, Truong, Ngoc, Ngo, Trung, and Bui, Dai
Subjects: MEDICAL terminology, RECOMMENDER systems
Abstract: In this paper, we propose the first method for automatic Vietnamese medical term discovery and extraction from clinical texts. The method combines linguistic filtering based on our defined open patterns with nested term extraction and statistical ranking using C-value. It does not require annotated corpora, external data resources, parameter settings, or term length restriction. Beside its specialty in handling Vietnamese medical terms, another novelty is that it uses Pointwise Mutual Information to split nested terms and the disjunctive acceptance condition to extract them. Evaluated on real Vietnamese electronic medical records, it achieves a precision of about 74% and recall of about 92% and is proved stably effective with small datasets. It outperforms the previous works in the same category of not using annotated corpora and external data resources. Our method and empirical evaluation analysis can lay a foundation for further research and development in Vietnamese medical term discovery and extraction. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

10. Research on Improvement of N-grams Based Text Classification by Applying Pointwise Mutual Information Measures.

Author: GEORGIEVA-TRIFONOVA, Tsvetanka
Subjects: INFORMATION measurement, FEATURE selection, VECTOR spaces, CLASSIFICATION
Abstract: In the present paper, the text classification is examined, which is applied after extracting N-grams of words to obtain characteristics describing the text documents in the collection. The selection of the most important features in regard to the pre-defined categories is made. The built vector space model for representation of text documents is modified by pointwise mutual information (PMI) measures. The conducted experiments include computation of the accuracy and F-measure of text classification with different methods for feature selection, different number of selected attributes (N-grams of words) for different classifiers and different datasets. The results obtained show an improvement in the performance of the classification of short texts with unbalanced categories. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

11. Profiling and analysis of chemical compounds using pointwise mutual information.

Author: Čmelo, I., Voršilák, M., and Svozil, D.
Subjects: ANALYTICAL chemistry, INFORMATION theory, RANDOM forest algorithms, COMPLEX compounds, ORGANIC compounds
Abstract: Pointwise mutual information (PMI) is a measure of association used in information theory. In this paper, PMI is used to characterize several publicly available databases (DrugBank, ChEMBL, PubChem and ZINC) in terms of association strength between compound structural features resulting in database PMI interrelation profiles. As structural features, substructure fragments obtained by coding individual compounds as MACCS, PubChemKey and ECFP fingerprints are used. The analysis of publicly available databases reveals, in accord with other studies, unusual properties of DrugBank compounds which further confirms the validity of PMI profiling approach. Z-standardized relative feature tightness (ZRFT), a PMI-derived measure that quantifies how well the given compound's feature combinations fit these in a particular compound set, is applied for the analysis of compound synthetic accessibility (SA), as well as for the classification of compounds as easy (ES) and hard (HS) to synthesize. ZRFT value distributions are compared with these of SYBA and SAScore. The analysis of ZRFT values of structurally complex compounds in the SAVI database reveals oligopeptide structures that are mispredicted by SAScore as HS, while correctly predicted by ZRFT and SYBA as ES. Compared to SAScore, SYBA and random forest, ZRFT predictions are less accurate, though by a narrow margin (AccZRFT = 94.5%, AccSYBA = 98.8%, AccSAScore = 99.0%, AccRF = 97.3%). However, ZRFT ability to distinguish between ES and HS compounds is surprisingly high considering that while SYBA, SAScore and random forest are dedicated SA models, ZRFT is a generic measurement that merely quantifies the strength of interrelations between structural feature pairs. The results presented in the current work indicate that structural feature co-occurrence, quantified by PMI or ZRFT, contains a significant amount of information relevant to physico-chemical properties of organic compounds. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

12. PMINR: Pointwise Mutual Information-Based Network Regression – With Application to Studies of Lung Cancer and Alzheimer's Disease.

Author: Lin, Weiqiang, Ji, Jiadong, Zhu, Yuchen, Li, Mingzhuo, Zhao, Jinghua, Xue, Fuzhong, and Yuan, Zhongshang
Subjects: BIOLOGICAL networks, ALZHEIMER'S disease, LUNG cancer, CANCER genes, DRUG development
Abstract: Complex diseases are believed to be the consequence of intracellular network(s) involving a range of factors. An improved understanding of a disease-predisposing biological network could lead to better identification of genes and pathways that confer disease risk and therefore inform drug development. The group difference in biological networks, as is often characterized by graphs of nodes and edges, is attributable to effects of these nodes and edges. Here we introduced pointwise mutual information (PMI) as a measure of the connection between a pair of nodes with either a linear relationship or nonlinear dependence. We then proposed a PMI-based network regression (PMINR) model to differentiate patterns of network changes (in node or edge) linking a disease outcome. Through simulation studies with various sample sizes and inter-node correlation structures, we showed that PMINR can accurately identify these changes with higher power than current methods and be robust to the network topology. Finally, we illustrated, with publicly available data on lung cancer and gene methylation data on aging and Alzheimer's disease, an evaluation of the practical performance of PMINR. We concluded that PMI is able to capture the generic inter-node correlation pattern in biological networks, and PMINR is a powerful and efficient approach for biological network analysis. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

13. Using automatic constructed thesauri instead of dictionaries in the verbal phraseological units validation task.

Author: Pinto, David, Priego, Belém, Singh, Vivek, and Perez, Fernando
Subjects: ENCYCLOPEDIAS & dictionaries, TASKS, VOCABULARY, FLIES, NATURAL language processing
Abstract: Automatic validation of compositionality vs non-compositionality is a very challenging problem in NLP. A very small number of papers in literature report results in this particular problem. Recently, some new approaches have arised with respect to this particular linguistic task. One of these approaches that have called our attention is based on what authors call "lexical domain". In this paper, we analyze the use of Pointwise Mutual Information for constructing thesauri on the fly, which can be further employed instead of dictionaries for determining whether or not a given phraseological unit is compositional or not. The experimental results carried out in this paper show that this dissimilarity measure (PMI), can effectively be used when determining compositionality of a given verbal phraseological unit. Moreover, we show that the use of thesauri improves the results obtained in comparison with those experiments employing dictionaries, highlighting the use of self-constructed lexical resources which are, in fact, taking advantage of the same vocabulary of the target dataset. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

14. Estimating spatiotemporal focus of documents using entropy with PMI.

Author: YAŞAR, Damla and TEKİR, Selma
Subjects: INFORMATION retrieval, TIME measurements, ESTIMATES
Abstract: Many text documents are spatiotemporal in nature, i.e. contents of a document can be mapped to a specific time period or location. For example, a news article about the French Revolution can be mapped to year 1789 as time and France as place. Identifying this time period and location associated with the document can be useful for various downstream applications such as document reasoning or spatiotemporal information retrieval. In this paper, temporal entropy with pointwise mutual information (PMI) is proposed to estimate the temporal focus of a document. PMI is used to measure the association of words with time expressions. Moreover, a word's temporal entropy is considered as a weight to its association with a time point and a single time point with the highest overall score is chosen as the focus time of a document. The proposed method is generic in the sense that it can also be applied for spatial focus estimation of documents. In the case of spatial entropy with PMI, PMI is used to calculate the association between words and place entities. The effectiveness of our proposed methods for spatiotemporal focus estimation is evaluated on diverse datasets of text documents. The experimental evaluation confirms the superiority of our proposed temporal and spatial focus estimation methods. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

15. Construction of competing endogenous RNA networks from paired RNA-seq data sets by pointwise mutual information.

Author: Lan, Chaowang, Peng, Hui, Hutvagner, Gyorgy, and Li, Jinyan
Subjects: RNA, NON-coding RNA, CROSSTALK, MICRORNA, MESSENGER RNA
Abstract: Background: A long noncoding RNA (lncRNA) can act as a competing endogenous RNA (ceRNA) to compete with an mRNA for binding to the same miRNA. Such an interplay between the lncRNA, miRNA, and mRNA is called a ceRNA crosstalk. As an miRNA may have multiple lncRNA targets and multiple mRNA targets, connecting all the ceRNA crosstalks mediated by the same miRNA forms a ceRNA network. Methods have been developed to construct ceRNA networks in the literature. However, these methods have limits because they have not explored the expression characteristics of total RNAs. Results: We proposed a novel method for constructing ceRNA networks and applied it to a paired RNA-seq data set. The first step of the method takes a competition regulation mechanism to derive candidate ceRNA crosstalks. Second, the method combines a competition rule and pointwise mutual information to compute a competition score for each candidate ceRNA crosstalk. Then, ceRNA crosstalks which have significant competition scores are selected to construct the ceRNA network. The key idea, pointwise mutual information, is ideally suitable for measuring the complex point-to-point relationships embedded in the ceRNA networks. Conclusion: Computational experiments and results demonstrate that the ceRNA networks can capture important regulatory mechanism of breast cancer, and have also revealed new insights into the treatment of breast cancer. The proposed method can be directly applied to other RNA-seq data sets for deeper disease understanding. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

16. 基于分布的中文词表示研究.

Author: 曹学飞, 李济洪, and 王瑞波
Subjects: MATRIX decomposition, ARTIFICIAL neural networks, TASK performance, PROBLEM solving, RESEMBLANCE (Philosophy)
Abstract: Copyright of Application Research of Computers / Jisuanji Yingyong Yanjiu is the property of Application Research of Computers Edition and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2019
Full Text: View/download PDF

17. Multinomial Naïve Bayes using similarity based conditional probability.

Author: Santhi, B. and Brindha, G.R.
Subjects: CONDITIONAL probability, CONTENT mining, SENTIMENT analysis, EXPONENTIAL functions, TEXT mining, BAYES' theorem
Abstract: The exponential growth of Internet through sharing text content necessitates the analysis to convert them into useful information. The research areas such as Web mining, Opinion mining and Text mining focus on studies namely content mining, statistical analysis, prediction, and classification. Multinomial Naïve Bayes (MNB), the state of art of Bayesian classifier is the fastest and simplest text classifier. The objective of the proposed study is to enhance the classification by substituting the conditional probability of existing MNB with probability based frequency computation. A new combination that consists of Pointwise Mutual Information (PMI) and different normalized Term Frequency (TF) is used for computing the conditional probability. The new combinations provide weight to the words based on the information gain carried by the words related to the document that belongs to a class. The robustness of Similarity based Enhanced Conditional Probability MNB (SECP-MNB) is reflected in classification accuracy measurement. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

18. A New Method for Sentiment Analysis Using Contextual Auto-Encoders.

Author: Ameur, Hanen, Jamoussi, Salma, and Hamadou, Abdelmajid Ben
Subjects: SENTIMENT analysis, MACHINE learning, ARTIFICIAL intelligence, SUPPORT vector machines, CLASSIFICATION algorithms
Abstract: Sentiment analysis, a hot research topic, presents new challenges for understanding users’ opinions and judgments expressed online. They aim to classify the subjective texts by assigning them a polarity label. In this paper, we introduce a novel machine learning framework using auto-encoders network to predict the sentiment polarity label at the word level and the sentence level. Inspired by the dimensionality reduction and the feature extraction capabilities of the auto-encoders, we propose a new model for distributed word vector representation “PMI-SA” using as input pointwise-mutual-information “PMI” word vectors. The resulted continuous word vectors are combined to represent a sentence. An unsupervised sentence embedding method, called Contextual Recursive Auto-Encoders “CoRAE”, is also developed for learning sentence representation. Indeed, CoRAE follows the basic idea of the recursive auto-encoders to deeply compose the vectors of words constituting the sentence, but without relying on any syntactic parse tree. The CoRAE model consists in combining recursively each word with its context words (neighbors’ words: previous and next) by considering the word order. A support vector machine classifier with fine-tuning technique is also used to show that our deep compositional representation model CoRAE improves significantly the accuracy of sentiment analysis task. Experimental results demonstrate that CoRAE remarkably outperforms several competitive baseline methods on two databases, namely, Sanders twitter corpus and Facebook comments corpus. The CoRAE model achieves an efficiency of 83.28% with the Facebook dataset and 97.57% with the Sanders dataset. [ABSTRACT FROM AUTHOR]
Published: 2018
Full Text: View/download PDF

19. Feature-level sentiment analysis by using comparative domain corpora.

Author: Quan, Changqin and Ren, Fuji
Subjects: SENTIMENT analysis, ELECTRONIC commerce, SEMANTICS, DEPENDENCY grammar, LEXICON, AFFECTIVE computing
Abstract: Feature-level sentiment analysis (SA) is able to provide more fine-grained SA on certain opinion targets and has a wider range of applications on E-business. This study proposes an approach based on comparative domain corpora for feature-level SA. The proposed approach makes use of word associations for domain-specific feature extraction. First, we assign a similarity score for each candidate feature to denote its similarity extent to a domain. Then we identify domain features based on their similarity scores on different comparative domain corpora. After that, dependency grammar and a general sentiment lexicon are applied to extract and expand feature-oriented opinion words. Lastly, the semantic orientation of a domain-specific feature is determined based on the feature-oriented opinion lexicons. In evaluation, we compare the proposed method with several state-of-the-art methods (including unsupervised and semi-supervised) using a standard product review test collection. The experimental results demonstrate the effectiveness of using comparative domain corpora. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

20. IR based Task-Model Learning: Automating the hierarchical structuring of tasks.

Author: Yusuke Fukazawa, Kröll, Mark, Strohmaier, Markus, and Ota, Jun
Subjects: TASK analysis, HIERARCHICAL clustering (Cluster analysis), CLUSTER analysis (Statistics), ALGORITHMS, VIRTUAL communities, MATHEMATICAL models
Abstract: Task-models concretize general requests to support users in real-world scenarios. In this paper, we present an IR based algorithm (IRTML) to automate the construction of hierarchically structured task-models. In contrast to other approaches, our algorithm is capable of assigning general tasks closer to the top and specific tasks closer to the bottom. Connections between tasks are established by extending Turney's PMI-IR measure. To evaluate our algorithm, we manually created a ground truth in the health-care domain consisting of 14 domains. We compared the IRTML algorithm to three state-of-the-art algorithms to generate hierarchical structures, i.e. BiSection K-means, Formal Concept Analysis and Bottom-Up Clustering. Our results show that IRTML achieves a 25.9% taxonomic overlap with the ground truth, a 32.0% improvement over the compared algorithms. [ABSTRACT FROM AUTHOR]
Published: 2016
Full Text: View/download PDF

21. Nested term recognition driven by word connection strength.

Author: Marciniak, Małgorzata and Mykowiecka, Agnieszka
Subjects: CORPORA, TERMS & phrases, VOCABULARY, GRAMMAR, ORDER (Grammar)
Abstract: Domain corpora are often not very voluminous and even important terms can occur in them not as isolated maximal phrases but only within more complex constructions. Appropriate recognition of nested terms can thus influence the content of the extracted candidate term list and its order. We propose a new method for identifying nested terms based on a combination of two aspects: grammatical correctness and normalised pointwise mutual information (NPMI) counted for all bigrams in a given corpus. NPMI is typically used for recognition of strong word connections, but in our solution we use it to recognise the weakest points to suggest the best place for division of a phrase into two parts. By creating, at most, two nested phrases in each step, we introduce a binary term structure. We test the impact of the proposed method applied, together with the C-value ranking method, to the automatic term recognition task performed on three corpora, two in Polish and one in English. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

22. 基于中层时空特征的人体行为识别.

Author: 王泰青 and 王生进
Abstract: Copyright of Journal of Image & Graphics is the property of Editorial Office of Journal of Image & Graphics and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2015
Full Text: View/download PDF

23. Using Pointwise Mutual Information to Identify Implicit Features in Customer Reviews.

Author: Su, Qi, Xiang, Kun, Wang, Houfeng, Sun, Bin, and Yu, Shiwen
Abstract: This paper is concerned with automatic identification of implicit product features expressed in product reviews in the context of opinion question answering. Utilizing a polarity lexicon, we map each adjectives in the lexicon to a set of predefined product features. According to the relationship between those opinion-oriented words and product features, we could identify what feature a review is regarding without the appearance of explicit feature nouns or phrases. The results of our experiments proved the validity of this method. [ABSTRACT FROM AUTHOR]
Published: 2006
Full Text: View/download PDF

24. Automatic Extraction of Property Norm-Like Data From Large Text Corpora.

Author: Kelly, Colin, Devereux, Barry, and Korhonen, Anna
Subjects: NATURAL language processing, AUTOMATIC extracting (Information science), ENTROPY (Information theory), CORPORA, METONYMS, DATA acquisition systems, SOCIAL norms
Abstract: Traditional methods for deriving property-based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is-a vehicle) or meronymy/metonymy (e.g., car has wheels), or unspecified relations (e.g., car- petrol). We propose a system for the challenging task of automatic, large-scale acquisition of unconstrained, human-like property norms from large text corpora, and discuss the theoretical implications of such a system. We employ syntactic, semantic, and encyclopedic information to guide our extraction, yielding concept-relation-feature triples (e.g., car be fast, car require petrol, car cause pollution), which approximate property-based conceptual representations. Our novel method extracts candidate triples from parsed corpora (Wikipedia and the British National Corpus) using syntactically and grammatically motivated rules, then reweights triples with a linear combination of their frequency and four statistical metrics. We assess our system output in three ways: lexical comparison with norms derived from human-generated property norm data, direct evaluation by four human judges, and a semantic distance comparison with both WordNet similarity data and human-judged concept similarity ratings. Our system offers a viable and performant method of plausible triple extraction: Our lexical comparison shows comparable performance to the current state-of-the-art, while subsequent evaluations exhibit the human-like character of our generated properties. [ABSTRACT FROM AUTHOR]
Published: 2014
Full Text: View/download PDF

25. Improving Word Similarity by Augmenting PMI with Estimates of Word Polysemy.

Author: Lushan Han, Finin, Tim, McNamee, Paul, Joshi, Anupam, and Yesha, Yelena
Subjects: INFORMATION storage & retrieval systems, POLYSEMY, BIG data, QUERY languages (Computer science), INFORMATION filtering, ESTIMATES
Abstract: Pointwise mutual information (PMI) is a widely used word similarity measure, but it lacks a clear explanation of how it works. We explore how PMI differs from distributional similarity, and we introduce a novel metric, PMImax, that augments PMI with information about a word's number of senses. The coefficients of PMImax are determined empirically by maximizing a utility function based on the performance of automatic thesaurus generation. We show that it outperforms traditional PMI in the application of automatic thesaurus generation and in two word similarity benchmark tasks: human similarity ratings and TOEFL synonym questions. PMImax achieves a correlation coefficient comparable to the best knowledge-based approaches on the Miller-Charles similarity rating data set. [ABSTRACT FROM PUBLISHER]
Published: 2013
Full Text: View/download PDF

26. Leveraging sentiment analysis for topic detection.

Author: Cai, Keke, Spangler, Scott, Ying Chen, and Li Zhang
Subjects: SOCIAL media, RESEARCH on Internet users, STATISTICAL bootstrapping, CONSUMER behavior, SYSTEM analysis
Abstract: The emergence of new social media such as blogs, message boards, news, and web content in general has dramatically changed the ecosystems of corporations. Consumers, non-profit organizations, and other forms of communities are extremely vocal about their opinions and perceptions on companies and their brands on the web. The ability to leverage such “voice of the web” to gain consumer, brand, and market insights can be truly differentiating and valuable to today's corporations. In particular, one important form of insights can be derived from sentiment analysis on web content. Sentiment analysis traditionally emphasizes on classification of web comments into positive, neutral, and negative categories. This paper goes beyond sentiment classification by focusing on techniques that could detect the topics that are highly correlated with the positive and negative opinions. Such techniques, when coupled with sentiment classification, can help the business analysts to understand both the overall sentiment scope as well as the drivers behind the sentiment. In this paper, we describe our overall sentiment analysis system that consists of such sentiment analysis techniques, including the bootstrapping method for word polarities weighting, automatic filtering and expansion for domain word, and a sentiment classification method. We then detail a novel topic detection method using point-wise mutual information and term frequency distribution. We demonstrate the effectiveness of our overall approaches via several case studies on different social media data sets. [ABSTRACT FROM AUTHOR]
Published: 2010
Full Text: View/download PDF

27. Detecting Word Substitutions in Text.

Author: Szewang Fong, Roussinov, Dmitri, and Skillicom, David B.
Subjects: CYBERTERRORISM, DATA protection, DATA security, COMPUTER security, EMAIL systems, INFORMATION technology security, COMPUTER networks, INFORMATION science, INFORMATION technology
Abstract: Searching for words on a watchlist is one way in which large-scale surveillance of communication can be done, for example, in intelligence and counterterrorism settings. One obvious defense is to replace words that might attract attention to a message with other more innocuous words. For example, the sentence "the attack will be tomorrow" might be altered to "the complex will be tomorrow," since "complex" is a word whose frequency is close to that of "attack." Such substitutions are readily detectable by humans since they do not make sense. We address the problem of detecting such substitutions automatically by looking for discrepancies between words and their contexts and using only syntactic information. We define a set of measures, each of which is quite weak, but which together produce per-sentence detection rates around 90 percent with false positive rates around 10 percent. Rules for combining per-sentence detection into per-message detection can reduce the false positive and false negative rates for messages to practical levels. We test the approach using sentences from the Enron e-mail and Brown corpora, representing informal and formal text, respectively. [ABSTRACT FROM AUTHOR]
Published: 2008
Full Text: View/download PDF

28. An Information-Theoretic Approach to Detect the Associations of GPS-Tracked Heifers in Pasture.

Author: Meckbach, Cornelia, Elsholz, Sabrina, Siede, Caroline, and Traulsen, Imke
Subjects: GLOBAL Positioning System, SOCIAL networks, HEIFERS, ANIMAL science, ANIMAL tracks
Abstract: Sensor technologies, such as the Global Navigation Satellite System (GNSS), produce huge amounts of data by tracking animal locations with high temporal resolution. Due to this high resolution, all animals show at least some co-occurrences, and the pure presence or absence of co-occurrences is not satisfactory for social network construction. Further, tracked animal contacts contain noise due to measurement errors or random co-occurrences. To identify significant associations, null models are commonly used, but the determination of an appropriate null model for GNSS data by maintaining the autocorrelation of tracks is challenging, and the construction is time and memory consuming. Bioinformaticians encounter phylogenetic background and random noise on sequencing data. They estimate this noise directly on the data by using the average product correction procedure, a method applied to information-theoretic measures. Using Global Positioning System (GPS) data of heifers in a pasture, we performed a proof of concept that this approach can be transferred to animal science for social network construction. The approach outputs stable results for up to 30% missing data points, and the predicted associations were in line with those of the null models. The effect of different distance thresholds for contact definition was marginal, but animal activity strongly affected the network structure. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

28 results on '"Pointwise mutual information"'

1. Understanding the effects of negative (and positive) pointwise mutual information on word vectors.

2. Chinese text classification by combining Chinese-BERTology-wwm and GCN.

3. A weighted-link graph neural network for lung cancer knowledge classification.

4. Using Pointwise Mutual Information for Breast Cancer Health Disparities Research With SEER-Medicare Claims.

5. An information‐theoretic approach to the analysis of location and colocation patterns.

6. USING A SEMANTIC FUZZY SYSTEM TO INTELLIGENT DOCUMENTS SUMMARIZATION.

7. Network regression analysis in transcriptome-wide association studies.

8. Network regression analysis in transcriptome-wide association studies.

9. Automatic medical term extraction from Vietnamese clinical texts.

10. Research on Improvement of N-grams Based Text Classification by Applying Pointwise Mutual Information Measures.

11. Profiling and analysis of chemical compounds using pointwise mutual information.

12. PMINR: Pointwise Mutual Information-Based Network Regression – With Application to Studies of Lung Cancer and Alzheimer's Disease.

13. Using automatic constructed thesauri instead of dictionaries in the verbal phraseological units validation task.

14. Estimating spatiotemporal focus of documents using entropy with PMI.

15. Construction of competing endogenous RNA networks from paired RNA-seq data sets by pointwise mutual information.

16. 基于分布的中文词表示研究.

17. Multinomial Naïve Bayes using similarity based conditional probability.

18. A New Method for Sentiment Analysis Using Contextual Auto-Encoders.

19. Feature-level sentiment analysis by using comparative domain corpora.

20. IR based Task-Model Learning: Automating the hierarchical structuring of tasks.

21. Nested term recognition driven by word connection strength.

22. 基于中层时空特征的人体行为识别.

23. Using Pointwise Mutual Information to Identify Implicit Features in Customer Reviews.

24. Automatic Extraction of Property Norm-Like Data From Large Text Corpora.

25. Improving Word Similarity by Augmenting PMI with Estimates of Word Polysemy.

26. Leveraging sentiment analysis for topic detection.

27. Detecting Word Substitutions in Text.

28. An Information-Theoretic Approach to Detect the Associations of GPS-Tracked Heifers in Pasture.

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

28 results on '"Pointwise mutual information"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources