666 results on '"Pointwise mutual information"'
Search Results
2. Low-Rank Approximation of Matrices for PMI-Based Word Embeddings
- Author
-
Sorokina, Alena, Karipbayeva, Aidana, Assylbekov, Zhenisbek, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, and Gelbukh, Alexander, editor
- Published
- 2023
- Full Text
- View/download PDF
3. An Ensemble Model for Detection of Adverse Drug Reactions
- Author
-
Ahmed A. Nafea, Mustafa S. Ibrahim, Abdulrahman A. Mukhlif, Mohammed M. AL-Ani, and Nazlia Omar
- Subjects
Adverse drug reactions ,Classification ,Ensemble Model ,Machine Learning ,Pointwise Mutual Information ,Technology ,Science - Abstract
The detection of adverse drug reactions (ADRs) plays a necessary role in comprehending the safety and benefit profiles of medicines. Although spontaneous reporting stays the standard approach for ADR documents, it suffers from significant under reporting rates and limitations in terms of treatment inspection. This study proposes an ensemble model that combines decision trees, support vector machines, random forests, and adaptive boosting (ADA-boost) to improve ADR detection. The experimental evaluation applied the benchmark data set and many preprocessing techniques such as tokenization, stop-word removal, stemming, and utilization of Point-wise Mutual Information. In addition, two term representations, namely, term frequency-inverse document frequency and term frequency, are utilized. The proposed ensemble model achieves an F-measure of 89% on the dataset. The proposed ensemble model shows its ability in detecting ADR to be a favored option in achieving both accuracy and clarity.
- Published
- 2024
- Full Text
- View/download PDF
4. Understanding the effects of negative (and positive) pointwise mutual information on word vectors.
- Author
-
Salle, Alexandre and Villavicencio, Aline
- Subjects
- *
GEOMETRIC modeling , *SEMANTICS , *VOCABULARY , *FACTORIZATION - Abstract
Despite the recent popularity of contextual word embeddings, static word embeddings still dominate lexical semantic tasks, making their study of continued relevance. A widely adopted family of such static word embeddings is derived by explicitly factorising the Pointwise Mutual Information (PMI) weighting of the co-occurrence matrix. As unobserved co-occurrences lead PMI to negative infinity, a common workaround is to clip negative PMI at 0. However, it is unclear what information is lost by collapsing negative PMI values to 0. To answer this question, we isolate and study the effects of negative (and positive) PMI on the semantics and geometry of models adopting factorisation of different PMI matrices. Word and sentence-level evaluations show that only accounting for positive PMI in the factorisation strongly captures both semantics and syntax, whereas using only negative PMI captures little of semantics but a surprising amount of syntactic information. Results also reveal that incorporating negative PMI induces stronger rank invariance of vector norms and directions, as well as improved rare word representations. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
5. Chinese text classification by combining Chinese-BERTology-wwm and GCN.
- Author
-
Xue Xu, Yu Chang, Jianye An, and Yongqiang Du
- Subjects
LANGUAGE models ,NATURAL language processing ,CHINESE language ,COSINE function - Abstract
Text classification is an important and classic application in natural language processing (NLP). Recent studies have shown that graph neural networks (GNNs) are effective in tasks with rich structural relationships and serve as effective transductive learning approaches. Text representation learning methods based on large-scale pretraining can learn implicit but rich semantic information from text. However, few studies have comprehensively utilized the contextual semantic and structural information for Chinese text classification. Moreover, the existing GNN methods for text classification did not consider the applicability of their graph construction methods to long or short texts. In this work, we propose Chinese-BERTology-wwm-GCN, a framework that combines Chinese bidirectional encoder representations from transformers (BERT) series models with whole word masking (Chinese-BERTology-wwm) and the graph convolutional network (GCN) for Chinese text classification. When building text graph, we use documents and words as nodes to construct a heterogeneous graph for the entire corpus. Specifically, we use the term frequency-inverse document frequency (TF-IDF) to construct the worddocument edge weights. For long text corpora, we propose an improved pointwise mutual information (PMI*) measure for words according to their word cooccurrence distances to represent the weights of word-word edges. For short text corpora, the co-occurrence information between words is often limited. Therefore, we utilize cosine similarity to represent the word-word edge weights. During the training stage, we effectively combine the cross-entropy and hinge losses and use them to jointly train Chinese-BERTology-wwm and GCN. Experiments show that our proposed framework significantly outperforms the baselines on three Chinese benchmark datasets and achieves good performance even with few labeled training sets. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
6. A weighted-link graph neural network for lung cancer knowledge classification.
- Author
-
Cheng, Ching-Hsue and Ji, Zheng-Ting
- Subjects
CLASSIFICATION ,LUNG cancer ,KNOWLEDGE graphs ,TUMOR classification ,KNOWLEDGE representation (Information theory) ,NATURAL language processing ,GRAPH algorithms - Abstract
Visualized knowledge representation can more effectively help the public gain knowledge about lung cancer prevention, diagnosis, treatment, and subsequent life. Therefore, this study collected articles on lung cancer from the well-known Web of Science database to analyze lung cancer literature, and the text data were published between 2016 and 2021. First, we used natural language processing to handle the collected text data, and then we used the latent Dirichlet allocation method to perform topic modeling and obtain the optimal topic numbers based on two coherence metrics for assigning the class of every article. Next, a PMI_2 weighted was proposed to build an initial weighted knowledge graph, and four graph neural network algorithms were used to train the initial weighted knowledge graph. In addition, we proposed a PMI_2 + link to improve the classification performance, and the additional links were obtained from the graph auto-encoder and graph convolutional network training. When the best classification performance has been obtained, these edge weights have a representative. For visualized knowledge representation, we used the Neo4j tool to display the nodes and edge weights for the final literature knowledge. The results show that the use of the proposed PMI_2 + link to build a weighted graph has a better classification performance. Further, the proposed PMI_2 + link can effectively reduce the number of edges on the knowledge graphs and avoid insufficient GPU memory. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. Using Pointwise Mutual Information for Breast Cancer Health Disparities Research With SEER-Medicare Claims
- Author
-
Brian L. Egleston, Ashis Kumar Chanda, Tian Bai, Carolyn Y. Fang, Richard J. Bleicher, and Slobodan Vucetic
- Subjects
seer-medicare claims ,machine learning ,pointwise mutual information ,breast cancer ,health disparities ,Psychology ,BF1-990 - Abstract
Identification of procedures using International Classification of Diseases or Healthcare Common Procedure Coding System codes is challenging when conducting medical claims research. We demonstrate how Pointwise Mutual Information can be used to find associated codes. We apply the method to an investigation of racial differences in breast cancer outcomes. We used Surveillance Epidemiology and End Results (SEER) data linked to Medicare claims. We identified treatment using two methods. First, we used previously published definitions. Second, we augmented definitions using codes empirically identified by the Pointwise Mutual Information statistic. Similar to previous findings, we found that presentation differences between Black and White women closed much of the estimated survival curve gap. However, we found that survival disparities were completely eliminated with the augmented treatment definitions. We were able to control for a wider range of treatment patterns that might affect survival differences between Black and White women with breast cancer.
- Published
- 2023
- Full Text
- View/download PDF
8. Modified Pointwise Mutual Information-Based Feature Selection for Text Classification
- Author
-
Georgieva-Trifonova, Tsvetanka, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, and Arai, Kohei, editor
- Published
- 2022
- Full Text
- View/download PDF
9. Network regression analysis in transcriptome-wide association studies
- Author
-
Xiuyuan Jin, Liye Zhang, Jiadong Ji, Tao Ju, Jinghua Zhao, and Zhongshang Yuan
- Subjects
TWAS ,Biological networks ,Dirichlet process regression ,Pointwise mutual information ,Blood pressure ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background Transcriptome-wide association studies (TWASs) have shown great promise in interpreting the findings from genome-wide association studies (GWASs) and exploring the disease mechanisms, by integrating GWAS and eQTL mapping studies. Almost all TWAS methods only focus on one gene at a time, with exception of only two published multiple-gene methods nevertheless failing to account for the inter-dependence as well as the network structure among multiple genes, which may lead to power loss in TWAS analysis as complex disease often owe to multiple genes that interact with each other as a biological network. We therefore developed a Network Regression method in a two-stage TWAS framework (NeRiT) to detect whether a given network is associated with the traits of interest. NeRiT adopts the flexible Bayesian Dirichlet process regression to obtain the gene expression prediction weights in the first stage, uses pointwise mutual information to represent the general between-node correlation in the second stage and can effectively take the network structure among different gene nodes into account. Results Comprehensive and realistic simulations indicated NeRiT had calibrated type I error control for testing both the node effect and edge effect, and yields higher power than the existed methods, especially in testing the edge effect. The results were consistent regardless of the GWAS sample size, the gene expression prediction model in the first step of TWAS, the network structure as well as the correlation pattern among different gene nodes. Real data applications through analyzing systolic blood pressure and diastolic blood pressure from UK Biobank showed that NeRiT can simultaneously identify the trait-related nodes as well as the trait-related edges. Conclusions NeRiT is a powerful and efficient network regression method in TWAS.
- Published
- 2022
- Full Text
- View/download PDF
10. Using Pointwise Mutual Information for Breast Cancer Health Disparities Research With SEER-Medicare Claims.
- Author
-
Egleston, Brian L., Chanda, Ashis Kumar, Tian Bai, Fang, Carolyn Y., Bleicher, Richard J., and Vucetic, Slobodan
- Subjects
- *
BREAST cancer treatment , *HEALTH equity , *MACHINE learning , *MEDICARE , *DATA analysis - Abstract
Identification of procedures using International Classification of Diseases or Healthcare Common Procedure Coding System codes is challenging when conducting medical claims research. We demonstrate how Pointwise Mutual Information can be used to find associated codes. We apply the method to an investigation of racial differences in breast cancer outcomes. We used Surveillance Epidemiology and End Results (SEER) data linked to Medicare claims. We identified treatment using two methods. First, we used previously published definitions. Second, we augmented definitions using codes empirically identified by the Pointwise Mutual Information statistic. Similar to previous findings, we found that presentation differences between Black and White women closed much of the estimated survival curve gap. However, we found that survival disparities were completely eliminated with the augmented treatment definitions. We were able to control for a wider range of treatment patterns that might affect survival differences between Black and White women with breast cancer. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
11. An information‐theoretic approach to the analysis of location and colocation patterns.
- Author
-
van Dam, Alje, Gomez‐Lievano, Andres, Neffke, Frank, and Frenken, Koen
- Subjects
- *
LOCATION analysis , *ECONOMIC geography , *STATISTICAL hypothesis testing , *INFERENTIAL statistics , *STATISTICAL significance - Abstract
The study of location and colocation of economic activities lies at the heart of economic geography and related disciplines, but the indices used to quantify these patterns are often defined ad hoc and lack a clear statistical foundation. We propose a statistical framework to quantify location and colocation associations of economic activities using information‐theoretic measures. We relate the resulting measures to existing measures of revealed comparative advantage, localization, specialization, and coagglomeration and show how different measures derive from the same general framework. To support the use of these measures in hypothesis testing and statistical inference, we develop a Bayesian estimation approach to provide measures of uncertainty and statistical significance of the estimated quantities. We illustrate this framework in an application to an analysis of location and colocation patterns of occupations in US cities. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
12. Network regression analysis for binary and ordinal categorical phenotypes in transcriptome-wide association studies.
- Author
-
Liye Zhang, Tao Ju, Xiuyuan Jin, Jiadong Ji, Jiayi Han, Xiang Zhou, and Zhongshang Yuan
- Subjects
- *
GENE expression , *GENE expression profiling , *LOGISTIC regression analysis , *ODDS ratio , *PHENOTYPES - Abstract
Transcriptome-wide association studies aim to integrate genome-wide association studies and expression quantitative trait loci mapping studies for exploring the gene regulatory mechanisms underlying diseases. Existing transcriptome-wide association study methods primarily focus on 1 gene at a time. However, complex diseases are seldom resulted from the abnormality of a single gene, but from the biological network involving multiple genes. In addition, binary or ordinal categorical phenotypes are commonly encountered in biomedicine. We develop a proportional odds logistic model for network regression in transcriptome-wide association study, Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study, to detect the association between a network and binary or ordinal categorical phenotype. Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study relies on 2- stage transcriptome-wide association study framework. It first adopts the distribution-robust nonparametric Dirichlet process regression model in expression quantitative trait loci study to obtain the SNP effect estimate on each gene within the network. Then, Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study uses pointwise mutual information to represent the general relationship among the network nodes of predicted gene expression in genome-wide association study, followed by the association analysis with all nodes and edges involved in proportional odds logistic model. A key feature of Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study is its ability to simultaneously identify the disease-related network nodes or edges. With extensive realistic simulations including those under various between-node correlation patterns, we show Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study can provide calibrated type I error control and yield higher power than other existing methods. We finally apply Proportional Odds LOgistic model for NEtwork regression in Transcriptome-wide association study to analyze bipolar and major depression status and blood pressure from UK Biobank to illustrate its benefits in real data analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
13. USING A SEMANTIC FUZZY SYSTEM TO INTELLIGENT DOCUMENTS SUMMARIZATION.
- Author
-
Amin, Ahmed E.
- Subjects
INFORMATION technology ,FUZZY systems ,SINGULAR value decomposition ,DEEP learning ,FORECASTING - Abstract
Due to the information technology revolution, there are many and varied methods of document summarization to obtain specific information from documents. Automated summarization methods rely on identifying important points in all relevant documents to produce a concise summary. Therefore, this paper presents an intelligent classification-based automated summarization system using a semantic neuro-fuzzy approach. The proposed system consists of five integrated phases, which are the Document Pre -processing, the intermediate representation, the Index Matrices Weight Calculation, the Neuro fuzzy system, and the Summary Generation, respectively. The first stage divides paragraphs into sentences and sentences into words, by removing the most frequent words that do not carry any information and stripping the word from suffixes and prefixes to extract the « root » of the words. In the second stage, the Latent Semantic Index was used to produce the words/concepts matrix and concepts/sentences matrix. The third stage used the pointwise mutual information measure that defines particularly informative about the target word, as well as the best weighting of association between words. The knowledge is then extracted using a neuro-fuzzy network learning technique in phase four, which encodes the learned knowledge in its structure as a set of fuzzy rules. In order to build a number of fuzzy models with an increasing number of input variables chosen by the user according to their rankings, a quick clustering technique is then implemented. Then, according to a user-defined confidence level, the summary is generated from the knowledge base by a better understanding of the fuzzy rules. Recall-Oriented Understudy for Gisting Evaluation (ROUGE), which showed improved results in comparison to previous strategies in terms of average accuracy, recall, and F-measure in the document understanding conference (DUC) dataset, was used to assess the performance of the suggested model. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
14. Collocation
- Author
-
McGillivray, Barbara, Tóth, Gábor Mihály, McGillivray, Barbara, and Tóth, Gábor Mihály
- Published
- 2020
- Full Text
- View/download PDF
15. Network regression analysis in transcriptome-wide association studies.
- Author
-
Jin, Xiuyuan, Zhang, Liye, Ji, Jiadong, Ju, Tao, Zhao, Jinghua, and Yuan, Zhongshang
- Subjects
- *
DIASTOLIC blood pressure , *REGRESSION analysis , *SYSTOLIC blood pressure , *FALSE positive error , *GENOME-wide association studies - Abstract
Background: Transcriptome-wide association studies (TWASs) have shown great promise in interpreting the findings from genome-wide association studies (GWASs) and exploring the disease mechanisms, by integrating GWAS and eQTL mapping studies. Almost all TWAS methods only focus on one gene at a time, with exception of only two published multiple-gene methods nevertheless failing to account for the inter-dependence as well as the network structure among multiple genes, which may lead to power loss in TWAS analysis as complex disease often owe to multiple genes that interact with each other as a biological network. We therefore developed a Network Regression method in a two-stage TWAS framework (NeRiT) to detect whether a given network is associated with the traits of interest. NeRiT adopts the flexible Bayesian Dirichlet process regression to obtain the gene expression prediction weights in the first stage, uses pointwise mutual information to represent the general between-node correlation in the second stage and can effectively take the network structure among different gene nodes into account. Results: Comprehensive and realistic simulations indicated NeRiT had calibrated type I error control for testing both the node effect and edge effect, and yields higher power than the existed methods, especially in testing the edge effect. The results were consistent regardless of the GWAS sample size, the gene expression prediction model in the first step of TWAS, the network structure as well as the correlation pattern among different gene nodes. Real data applications through analyzing systolic blood pressure and diastolic blood pressure from UK Biobank showed that NeRiT can simultaneously identify the trait-related nodes as well as the trait-related edges. Conclusions: NeRiT is a powerful and efficient network regression method in TWAS. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
16. Automatic medical term extraction from Vietnamese clinical texts.
- Author
-
Vo, Chau, Cao, Tru, Truong, Ngoc, Ngo, Trung, and Bui, Dai
- Subjects
- *
MEDICAL terminology , *RECOMMENDER systems - Abstract
In this paper, we propose the first method for automatic Vietnamese medical term discovery and extraction from clinical texts. The method combines linguistic filtering based on our defined open patterns with nested term extraction and statistical ranking using C-value. It does not require annotated corpora, external data resources, parameter settings, or term length restriction. Beside its specialty in handling Vietnamese medical terms, another novelty is that it uses Pointwise Mutual Information to split nested terms and the disjunctive acceptance condition to extract them. Evaluated on real Vietnamese electronic medical records, it achieves a precision of about 74% and recall of about 92% and is proved stably effective with small datasets. It outperforms the previous works in the same category of not using annotated corpora and external data resources. Our method and empirical evaluation analysis can lay a foundation for further research and development in Vietnamese medical term discovery and extraction. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
17. Task-specific dependency-based word embedding methods.
- Author
-
Wei, Chengwei, Wang, Bin, and Kuo, C.-C. Jay
- Subjects
- *
NATURAL language processing , *TASKS - Abstract
• Two task-specific dependency-based word embedding methods are proposed. • Our methods exploit the dependency parse tree to construct more effective contexts. • Word-context and word-class info is merged to enhance text classification accuracy. • Our methods outperform all word embedding baselines on text classification. While most traditional word embedding methods target generic tasks, two task-specific dependency-based word embedding methods are proposed for better performance in text classification tasks in this work. First, we exploit the dependency parsing tree structure to capture the structural information of a sentence, and develop a method called dependency-based word embedding (DWE). It finds keywords and neighbor words of a target word as contexts via dependency parsing. Next, we leverage the word-class co-occurrence statistics to model the class distributional information and incorporate it into the embedding learning process. This leads to the class-enhanced dependency-based word embedding (CEDWE) method. Task-specific corpora and the matrix-factorization-based framework are used to train DWE and CEDWE. Seven text classification datasets are used to evaluate the performance of DWE and CEDWE, and experimental results show that they outperform several state-of-the-art word embedding methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
18. Profiling and analysis of chemical compounds using pointwise mutual information
- Author
-
I. Čmelo, M. Voršilák, and D. Svozil
- Subjects
Hashed fingerprint ,Structural key ,Information theory ,Pointwise mutual information ,Synthetic accessibility ,Information technology ,T58.5-58.64 ,Chemistry ,QD1-999 - Abstract
Abstract Pointwise mutual information (PMI) is a measure of association used in information theory. In this paper, PMI is used to characterize several publicly available databases (DrugBank, ChEMBL, PubChem and ZINC) in terms of association strength between compound structural features resulting in database PMI interrelation profiles. As structural features, substructure fragments obtained by coding individual compounds as MACCS, PubChemKey and ECFP fingerprints are used. The analysis of publicly available databases reveals, in accord with other studies, unusual properties of DrugBank compounds which further confirms the validity of PMI profiling approach. Z-standardized relative feature tightness (ZRFT), a PMI-derived measure that quantifies how well the given compound’s feature combinations fit these in a particular compound set, is applied for the analysis of compound synthetic accessibility (SA), as well as for the classification of compounds as easy (ES) and hard (HS) to synthesize. ZRFT value distributions are compared with these of SYBA and SAScore. The analysis of ZRFT values of structurally complex compounds in the SAVI database reveals oligopeptide structures that are mispredicted by SAScore as HS, while correctly predicted by ZRFT and SYBA as ES. Compared to SAScore, SYBA and random forest, ZRFT predictions are less accurate, though by a narrow margin (Acc ZRFT = 94.5%, Acc SYBA = 98.8%, Acc SAScore = 99.0%, Acc RF = 97.3%). However, ZRFT ability to distinguish between ES and HS compounds is surprisingly high considering that while SYBA, SAScore and random forest are dedicated SA models, ZRFT is a generic measurement that merely quantifies the strength of interrelations between structural feature pairs. The results presented in the current work indicate that structural feature co-occurrence, quantified by PMI or ZRFT, contains a significant amount of information relevant to physico-chemical properties of organic compounds.
- Published
- 2021
- Full Text
- View/download PDF
19. Sentiment Analysis Using Weight Model Based on SentiWordNet 3.0
- Author
-
Kumar, Jitendra, Rout, Jitendra Kumar, katiyar, Anshu, Jena, Sanjay Kumar, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Sa, Pankaj Kumar, editor, Bakshi, Sambit, editor, Hatzilygeroudis, Ioannis K., editor, and Sahoo, Manmath Narayan, editor
- Published
- 2018
- Full Text
- View/download PDF
20. Research on Improvement of N-grams Based Text Classification by Applying Pointwise Mutual Information Measures.
- Author
-
GEORGIEVA-TRIFONOVA, Tsvetanka
- Subjects
INFORMATION measurement ,FEATURE selection ,VECTOR spaces ,CLASSIFICATION - Abstract
In the present paper, the text classification is examined, which is applied after extracting N-grams of words to obtain characteristics describing the text documents in the collection. The selection of the most important features in regard to the pre-defined categories is made. The built vector space model for representation of text documents is modified by pointwise mutual information (PMI) measures. The conducted experiments include computation of the accuracy and F-measure of text classification with different methods for feature selection, different number of selected attributes (N-grams of words) for different classifiers and different datasets. The results obtained show an improvement in the performance of the classification of short texts with unbalanced categories. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
21. Construction of competing endogenous RNA networks from paired RNA-seq data sets by pointwise mutual information
- Author
-
Chaowang Lan, Hui Peng, Gyorgy Hutvagner, and Jinyan Li
- Subjects
Competing endogenous RNA ,Pointwise mutual information ,Competition rule ,Biotechnology ,TP248.13-248.65 ,Genetics ,QH426-470 - Abstract
Abstract Background A long noncoding RNA (lncRNA) can act as a competing endogenous RNA (ceRNA) to compete with an mRNA for binding to the same miRNA. Such an interplay between the lncRNA, miRNA, and mRNA is called a ceRNA crosstalk. As an miRNA may have multiple lncRNA targets and multiple mRNA targets, connecting all the ceRNA crosstalks mediated by the same miRNA forms a ceRNA network. Methods have been developed to construct ceRNA networks in the literature. However, these methods have limits because they have not explored the expression characteristics of total RNAs. Results We proposed a novel method for constructing ceRNA networks and applied it to a paired RNA-seq data set. The first step of the method takes a competition regulation mechanism to derive candidate ceRNA crosstalks. Second, the method combines a competition rule and pointwise mutual information to compute a competition score for each candidate ceRNA crosstalk. Then, ceRNA crosstalks which have significant competition scores are selected to construct the ceRNA network. The key idea, pointwise mutual information, is ideally suitable for measuring the complex point-to-point relationships embedded in the ceRNA networks. Conclusion Computational experiments and results demonstrate that the ceRNA networks can capture important regulatory mechanism of breast cancer, and have also revealed new insights into the treatment of breast cancer. The proposed method can be directly applied to other RNA-seq data sets for deeper disease understanding.
- Published
- 2019
- Full Text
- View/download PDF
22. Topic-Based Sentiment Analysis
- Author
-
Buddhitha, Prasadith, Inkpen, Diana, Diniz Junqueira Barbosa, Simone, Series editor, Chen, Phoebe, Series editor, Du, Xiaoyong, Series editor, Filipe, Joaquim, Series editor, Kara, Orhun, Series editor, Kotenko, Igor, Series editor, Liu, Ting, Series editor, Sivalingam, Krishna M., Series editor, Washio, Takashi, Series editor, Lossio-Ventura, Juan Antonio, editor, and Alatrista-Salas, Hugo, editor
- Published
- 2017
- Full Text
- View/download PDF
23. Topic Modeling Based on Frequent Sequences Graphs
- Author
-
Ozdzynski, Piotr, Zakrzewska, Danuta, Kacprzyk, Janusz, Series editor, Pal, Nikhil R., Advisory editor, Bello Perez, Rafael, Advisory editor, Corchado, Emilio, Advisory editor, Hagras, Hani, Advisory editor, Kóczy, László T., Advisory editor, Kreinovich, Vladik, Advisory editor, Lin, Chin-Teng, Advisory editor, Lu, Jie, Advisory editor, Melin, Patricia, Advisory editor, Nedjah, Nadia, Advisory editor, Nguyen, Ngoc Thanh, Advisory editor, Wang, Jun, Advisory editor, Świątek, Jerzy, editor, and Tomczak, Jakub M., editor
- Published
- 2017
- Full Text
- View/download PDF
24. Neural Induction of a Lexicon for Fast and Interpretable Stance Classification
- Author
-
Clos, Jérémie, Wiratunga, Nirmalie, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Gracia, Jorge, editor, Bond, Francis, editor, McCrae, John P., editor, Buitelaar, Paul, editor, Chiarcos, Christian, editor, and Hellmann, Sebastian, editor
- Published
- 2017
- Full Text
- View/download PDF
25. Technical Aspect Extraction from Customer Reviews Based on Seeded Word Clustering
- Author
-
Davril, Jean-Marc, Leclercq, Tony, Cordy, Maxime, Heymans, Patrick, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Frasincar, Flavius, editor, Ittoo, Ashwin, editor, Nguyen, Le Minh, editor, and Métais, Elisabeth, editor
- Published
- 2017
- Full Text
- View/download PDF
26. Profiling and analysis of chemical compounds using pointwise mutual information.
- Author
-
Čmelo, I., Voršilák, M., and Svozil, D.
- Subjects
- *
ANALYTICAL chemistry , *INFORMATION theory , *RANDOM forest algorithms , *COMPLEX compounds , *ORGANIC compounds - Abstract
Pointwise mutual information (PMI) is a measure of association used in information theory. In this paper, PMI is used to characterize several publicly available databases (DrugBank, ChEMBL, PubChem and ZINC) in terms of association strength between compound structural features resulting in database PMI interrelation profiles. As structural features, substructure fragments obtained by coding individual compounds as MACCS, PubChemKey and ECFP fingerprints are used. The analysis of publicly available databases reveals, in accord with other studies, unusual properties of DrugBank compounds which further confirms the validity of PMI profiling approach. Z-standardized relative feature tightness (ZRFT), a PMI-derived measure that quantifies how well the given compound's feature combinations fit these in a particular compound set, is applied for the analysis of compound synthetic accessibility (SA), as well as for the classification of compounds as easy (ES) and hard (HS) to synthesize. ZRFT value distributions are compared with these of SYBA and SAScore. The analysis of ZRFT values of structurally complex compounds in the SAVI database reveals oligopeptide structures that are mispredicted by SAScore as HS, while correctly predicted by ZRFT and SYBA as ES. Compared to SAScore, SYBA and random forest, ZRFT predictions are less accurate, though by a narrow margin (AccZRFT = 94.5%, AccSYBA = 98.8%, AccSAScore = 99.0%, AccRF = 97.3%). However, ZRFT ability to distinguish between ES and HS compounds is surprisingly high considering that while SYBA, SAScore and random forest are dedicated SA models, ZRFT is a generic measurement that merely quantifies the strength of interrelations between structural feature pairs. The results presented in the current work indicate that structural feature co-occurrence, quantified by PMI or ZRFT, contains a significant amount of information relevant to physico-chemical properties of organic compounds. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
27. PMINR: Pointwise Mutual Information-Based Network Regression – With Application to Studies of Lung Cancer and Alzheimer’s Disease
- Author
-
Weiqiang Lin, Jiadong Ji, Yuchen Zhu, Mingzhuo Li, Jinghua Zhao, Fuzhong Xue, and Zhongshang Yuan
- Subjects
biological networks ,pointwise mutual information ,regression ,lung cancer ,Alzheimer’s disease ,Genetics ,QH426-470 - Abstract
Complex diseases are believed to be the consequence of intracellular network(s) involving a range of factors. An improved understanding of a disease-predisposing biological network could lead to better identification of genes and pathways that confer disease risk and therefore inform drug development. The group difference in biological networks, as is often characterized by graphs of nodes and edges, is attributable to effects of these nodes and edges. Here we introduced pointwise mutual information (PMI) as a measure of the connection between a pair of nodes with either a linear relationship or nonlinear dependence. We then proposed a PMI-based network regression (PMINR) model to differentiate patterns of network changes (in node or edge) linking a disease outcome. Through simulation studies with various sample sizes and inter-node correlation structures, we showed that PMINR can accurately identify these changes with higher power than current methods and be robust to the network topology. Finally, we illustrated, with publicly available data on lung cancer and gene methylation data on aging and Alzheimer’s disease, an evaluation of the practical performance of PMINR. We concluded that PMI is able to capture the generic inter-node correlation pattern in biological networks, and PMINR is a powerful and efficient approach for biological network analysis.
- Published
- 2020
- Full Text
- View/download PDF
28. PMINR: Pointwise Mutual Information-Based Network Regression – With Application to Studies of Lung Cancer and Alzheimer's Disease.
- Author
-
Lin, Weiqiang, Ji, Jiadong, Zhu, Yuchen, Li, Mingzhuo, Zhao, Jinghua, Xue, Fuzhong, and Yuan, Zhongshang
- Subjects
BIOLOGICAL networks ,ALZHEIMER'S disease ,LUNG cancer ,CANCER genes ,DRUG development - Abstract
Complex diseases are believed to be the consequence of intracellular network(s) involving a range of factors. An improved understanding of a disease-predisposing biological network could lead to better identification of genes and pathways that confer disease risk and therefore inform drug development. The group difference in biological networks, as is often characterized by graphs of nodes and edges, is attributable to effects of these nodes and edges. Here we introduced pointwise mutual information (PMI) as a measure of the connection between a pair of nodes with either a linear relationship or nonlinear dependence. We then proposed a PMI-based network regression (PMINR) model to differentiate patterns of network changes (in node or edge) linking a disease outcome. Through simulation studies with various sample sizes and inter-node correlation structures, we showed that PMINR can accurately identify these changes with higher power than current methods and be robust to the network topology. Finally, we illustrated, with publicly available data on lung cancer and gene methylation data on aging and Alzheimer's disease, an evaluation of the practical performance of PMINR. We concluded that PMI is able to capture the generic inter-node correlation pattern in biological networks, and PMINR is a powerful and efficient approach for biological network analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
29. Using automatic constructed thesauri instead of dictionaries in the verbal phraseological units validation task.
- Author
-
Pinto, David, Priego, Belém, Singh, Vivek, and Perez, Fernando
- Subjects
- *
ENCYCLOPEDIAS & dictionaries , *TASKS , *VOCABULARY , *FLIES , *NATURAL language processing - Abstract
Automatic validation of compositionality vs non-compositionality is a very challenging problem in NLP. A very small number of papers in literature report results in this particular problem. Recently, some new approaches have arised with respect to this particular linguistic task. One of these approaches that have called our attention is based on what authors call "lexical domain". In this paper, we analyze the use of Pointwise Mutual Information for constructing thesauri on the fly, which can be further employed instead of dictionaries for determining whether or not a given phraseological unit is compositional or not. The experimental results carried out in this paper show that this dissimilarity measure (PMI), can effectively be used when determining compositionality of a given verbal phraseological unit. Moreover, we show that the use of thesauri improves the results obtained in comparison with those experiments employing dictionaries, highlighting the use of self-constructed lexical resources which are, in fact, taking advantage of the same vocabulary of the target dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
30. Estimating spatiotemporal focus of documents using entropy with PMI.
- Author
-
YAŞAR, Damla and TEKİR, Selma
- Subjects
- *
INFORMATION retrieval , *TIME measurements , *ESTIMATES - Abstract
Many text documents are spatiotemporal in nature, i.e. contents of a document can be mapped to a specific time period or location. For example, a news article about the French Revolution can be mapped to year 1789 as time and France as place. Identifying this time period and location associated with the document can be useful for various downstream applications such as document reasoning or spatiotemporal information retrieval. In this paper, temporal entropy with pointwise mutual information (PMI) is proposed to estimate the temporal focus of a document. PMI is used to measure the association of words with time expressions. Moreover, a word's temporal entropy is considered as a weight to its association with a time point and a single time point with the highest overall score is chosen as the focus time of a document. The proposed method is generic in the sense that it can also be applied for spatial focus estimation of documents. In the case of spatial entropy with PMI, PMI is used to calculate the association between words and place entities. The effectiveness of our proposed methods for spatiotemporal focus estimation is evaluated on diverse datasets of text documents. The experimental evaluation confirms the superiority of our proposed temporal and spatial focus estimation methods. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
31. Extracting Users' Explicit Preferences from Free-text using Second Order Co-occurrence PMI in Indian Matrimony.
- Author
-
Tabassum, Nazia and Ahmad, Tanvir
- Subjects
MARRIAGE ,RECOMMENDER systems ,ORDER - Abstract
This paper is a corpus-based method for extracting users' explicit preferences from free-text part of the registered user profile in Indian matchmaking system using Second order Co-occurrence PMI (SOC-PMI).In online Indian matrimonial system, while registering, users are asked to provide information about themselves, family and the qualities he/she is looking for in a desired partner but there is no means to automatically prioritize the desired partner features (which features are more important to him/her and it varies from person to person) except for explicitly asking the user. Extraction of users' preferences from unconstrained attributes has not been explored yet. This motivates us to extract users' preferences from an unexplored area. The contribution of this paper is focused on the above gap found in the research. The methodology explained in this paper automatically prioritizes these features which can be used to design a Weighted Reciprocal Recommendation model to generate more efficient recommendations. Experimental results show the efficiency of the applied methodology. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
32. Sentiment Analysis in Social Streams
- Author
-
Saif, Hassan, Ortega, F. Javier, Fernández, Miriam, Cantador, Iván, Tan, Desney, Editor-in-chief, Vanderdonckt, Jean, Editor-in-chief, Tkalčič, Marko, editor, De Carolis, Berardina, editor, de Gemmis, Marco, editor, Odić, Ante, editor, and Košir, Andrej, editor
- Published
- 2016
- Full Text
- View/download PDF
33. Evaluating Categorisation in Real Life – An Argument Against Simple but Impractical Metrics
- Author
-
Karlsson, Vide, Herman, Pawel, Karlgren, Jussi, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Fuhr, Norbert, editor, Quaresma, Paulo, editor, Gonçalves, Teresa, editor, Larsen, Birger, editor, Balog, Krisztian, editor, Macdonald, Craig, editor, Cappellato, Linda, editor, and Ferro, Nicola, editor
- Published
- 2016
- Full Text
- View/download PDF
34. User Profiling by Combining Topic Modeling and Pointwise Mutual Information (TM-PMI)
- Author
-
Wu, Lifang, Wang, Dan, Guo, Cheng, Zhang, Jianan, Chen, Chang wen, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Tian, Qi, editor, Sebe, Nicu, editor, Qi, Guo-Jun, editor, Huet, Benoit, editor, Hong, Richang, editor, and Liu, Xueliang, editor
- Published
- 2016
- Full Text
- View/download PDF
35. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
- Author
-
Sigurdsson, Gunnar A., Varol, Gül, Wang, Xiaolong, Farhadi, Ali, Laptev, Ivan, Gupta, Abhinav, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Leibe, Bastian, editor, Matas, Jiri, editor, Sebe, Nicu, editor, and Welling, Max, editor
- Published
- 2016
- Full Text
- View/download PDF
36. An information‐theoretic approach to the analysis of location and colocation patterns
- Author
-
van Dam, Alje, Gomez-Lievano, Andres, Neffke, Frank, Frenken, Koen, and Innovation Studies
- Subjects
coagglomeration ,relatedness ,location quotient ,pointwise mutual information ,Environmental Science (miscellaneous) ,Development ,revealed comparative advantage - Abstract
The study of location and colocation of economic activities lies at the heart of economic geography and related disciplines, but the indices used to quantify these patterns are often defined ad hoc and lack a clear statistical foundation. We propose a statistical framework to quantify location and colocation associations of economic activities using information-theoretic measures. We relate the resulting measures to existing measures of revealed comparative advantage, localization, specialization, and coagglomeration and show how different measures derive from the same general framework. To support the use of these measures in hypothesis testing and statistical inference, we develop a Bayesian estimation approach to provide measures of uncertainty and statistical significance of the estimated quantities. We illustrate this framework in an application to an analysis of location and colocation patterns of occupations in US cities.
- Published
- 2022
- Full Text
- View/download PDF
37. An Information-Theoretic Approach to Detect the Associations of GPS-Tracked Heifers in Pasture
- Author
-
Cornelia Meckbach, Sabrina Elsholz, Caroline Siede, and Imke Traulsen
- Subjects
social networks ,pointwise mutual information ,association measure ,information theory ,sensor-tracked animals ,Chemical technology ,TP1-1185 - Abstract
Sensor technologies, such as the Global Navigation Satellite System (GNSS), produce huge amounts of data by tracking animal locations with high temporal resolution. Due to this high resolution, all animals show at least some co-occurrences, and the pure presence or absence of co-occurrences is not satisfactory for social network construction. Further, tracked animal contacts contain noise due to measurement errors or random co-occurrences. To identify significant associations, null models are commonly used, but the determination of an appropriate null model for GNSS data by maintaining the autocorrelation of tracks is challenging, and the construction is time and memory consuming. Bioinformaticians encounter phylogenetic background and random noise on sequencing data. They estimate this noise directly on the data by using the average product correction procedure, a method applied to information-theoretic measures. Using Global Positioning System (GPS) data of heifers in a pasture, we performed a proof of concept that this approach can be transferred to animal science for social network construction. The approach outputs stable results for up to 30% missing data points, and the predicted associations were in line with those of the null models. The effect of different distance thresholds for contact definition was marginal, but animal activity strongly affected the network structure.
- Published
- 2021
- Full Text
- View/download PDF
38. Ode to a Keatsian Turn: Creating Meaningful and Poetic Instances of Rhetorical Forms
- Author
-
Veale, Tony, Kühnberger, Kai-Uwe, Series editor, Besold, Tarek R., editor, Schorlemmer, Marco, editor, and Smaill, Alan, editor
- Published
- 2015
- Full Text
- View/download PDF
39. Probabilistic Segmentation of Musical Sequences Using Restricted Boltzmann Machines
- Author
-
Lattner, Stefan, Grachten, Maarten, Agres, Kat, Cancino Chacón, Carlos Eduardo, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Collins, Tom, editor, Meredith, David, editor, and Volk, Anja, editor
- Published
- 2015
- Full Text
- View/download PDF
40. Information Extraction for Learning Expressive Ontologies
- Author
-
Petrucci, Giulio, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Gandon, Fabien, editor, Sabou, Marta, editor, Sack, Harald, editor, d’Amato, Claudia, editor, Cudré-Mauroux, Philippe, editor, and Zimmermann, Antoine, editor
- Published
- 2015
- Full Text
- View/download PDF
41. Learning Focused Hierarchical Topic Models with Semi-Supervision in Microblogs
- Author
-
Slutsky, Anton, Hu, Xiaohua, An, Yuan, Goebel, Randy, Series editor, Tanaka, Yuzuru, Series editor, Wahlster, Wolfgang, Series editor, Cao, Tru, editor, Lim, Ee-Peng, editor, Zhou, Zhi-Hua, editor, Ho, Tu-Bao, editor, Cheung, David, editor, and Motoda, Hiroshi, editor
- Published
- 2015
- Full Text
- View/download PDF
42. Twitter Sentiment Detection via Ensemble Classification Using Averaged Confidence Scores
- Author
-
Hagen, Matthias, Potthast, Martin, Büchner, Michel, Stein, Benno, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Kobsa, Alfred, Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Hanbury, Allan, editor, Kazai, Gabriella, editor, Rauber, Andreas, editor, and Fuhr, Norbert, editor
- Published
- 2015
- Full Text
- View/download PDF
43. Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning
- Author
-
Jie Li, Lincheng Shen, Zhihong Liu, and Wenhong Zhou
- Subjects
Computational complexity theory ,Artificial neural network ,Computer science ,business.industry ,Mechanical Engineering ,Aerospace Engineering ,ComputerApplications_COMPUTERSINOTHERSYSTEMS ,Dot product ,Pointwise mutual information ,ComputingMethodologies_ARTIFICIALINTELLIGENCE ,Regularization (mathematics) ,Term (time) ,Reinforcement learning ,Artificial intelligence ,business ,Reciprocal - Abstract
Multi-Target Tracking Guidance (MTTG) in unknown environments has great potential values in applications for Unmanned Aerial Vehicle (UAV) swarms. Although Multi-Agent Deep Reinforcement Learning (MADRL) is a promising technique for learning cooperation, most of the existing methods cannot scale well to decentralized UAV swarms due to their computational complexity or global information requirement. This paper proposes a decentralized MADRL method using the maximum reciprocal reward to learn cooperative tracking policies for UAV swarms. This method reshapes each UAV's reward with a regularization term that is defined as the dot product of the reward vector of all neighbor UAVs and the corresponding dependency vector between the UAV and the neighbors. And the dependence between UAVs can be directly captured by the Pointwise Mutual Information (PMI) neural network without complicated aggregation statistics. Then, the experience sharing Reciprocal Reward Multi-Agent Actor-Critic (MAAC-R) algorithm is proposed to learn the cooperative sharing policy for all homogeneous UAVs. Experiments demonstrate that the proposed algorithm can improve the UAVs’ cooperation more effectively than the baseline algorithms, and can stimulate a rich form of cooperative tracking behaviors of UAV swarms. Besides, the learned policy can better scale to other scenarios with more UAVs and targets.
- Published
- 2022
- Full Text
- View/download PDF
44. Analysis of Uncertain Scalar Data with Hixels
- Author
-
Levine, Joshua A., Thompson, David, Bennett, Janine C., Bremer, Peer-Timo, Gyulassy, Attila, Pascucci, Valerio, Pébay, Philippe P., Farin, Gerald, Series editor, Hege, Hans-Christian, Series editor, Hoffman, David, Series editor, Johnson, Christopher R., Series editor, Polthier, Konrad, Series editor, Rumpf, Martin, Series editor, Hansen, Charles D., editor, Chen, Min, editor, Kaufman, Arie E., editor, and Hagen, Hans, editor
- Published
- 2014
- Full Text
- View/download PDF
45. Construction of competing endogenous RNA networks from paired RNA-seq data sets by pointwise mutual information.
- Author
-
Lan, Chaowang, Peng, Hui, Hutvagner, Gyorgy, and Li, Jinyan
- Subjects
- *
RNA , *NON-coding RNA , *CROSSTALK , *MICRORNA , *MESSENGER RNA - Abstract
Background: A long noncoding RNA (lncRNA) can act as a competing endogenous RNA (ceRNA) to compete with an mRNA for binding to the same miRNA. Such an interplay between the lncRNA, miRNA, and mRNA is called a ceRNA crosstalk. As an miRNA may have multiple lncRNA targets and multiple mRNA targets, connecting all the ceRNA crosstalks mediated by the same miRNA forms a ceRNA network. Methods have been developed to construct ceRNA networks in the literature. However, these methods have limits because they have not explored the expression characteristics of total RNAs. Results: We proposed a novel method for constructing ceRNA networks and applied it to a paired RNA-seq data set. The first step of the method takes a competition regulation mechanism to derive candidate ceRNA crosstalks. Second, the method combines a competition rule and pointwise mutual information to compute a competition score for each candidate ceRNA crosstalk. Then, ceRNA crosstalks which have significant competition scores are selected to construct the ceRNA network. The key idea, pointwise mutual information, is ideally suitable for measuring the complex point-to-point relationships embedded in the ceRNA networks. Conclusion: Computational experiments and results demonstrate that the ceRNA networks can capture important regulatory mechanism of breast cancer, and have also revealed new insights into the treatment of breast cancer. The proposed method can be directly applied to other RNA-seq data sets for deeper disease understanding. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
46. An Efficient Sentiment Analysis Approach for Product Review using Turney Algorithm.
- Author
-
Kanna, P. Rajesh and Pandiaraja, P.
- Subjects
SENTIMENT analysis ,PRODUCT reviews ,DECISION trees ,ALGORITHMS - Abstract
Sentiment analysis can be done by means of Classification and its most important tasks are text categorization, tone recognition, image classification etc. Mostly the extant methods of supervised classification are based on traditional statistics, which can provide ideal results. The main aim is to increase the accuracy and to report the manufacturer about the negatives of the product. The major problem is categorization of sentiment polarity, which is the problem of sentiment analysis. There are two levels of categorization and they are Review-level Categorization and Sentence-level Categorization. Categorization of review-level becomes arduous when we attempt to classify the reviews respect with their specific rating related to star-scaled. Second, Review-level Categorization has a drawback in Implicit-level sentiment analysis. Mostly SVM, Naïve Bayesian and Decision Tree are mainly used to improve the efficiency of classification. Amazon Dataset is used as Dataset in proposed system to improve the accuracy of Turney algorithm. Semantic Orientation (SO) with Point wise Mutual Information yields good results than other classification methods. The review level gets subjected as positive value, on acquaintance of positive average SO. On the other hand, the review level acquires a negative level in accordance with attainment of negative average SO. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
47. جستاری بر فرایند سازماندهی و بازیابی متون...
- Author
-
سعیده انبایی فریمانی, حمید طباطبایی, and مجتبی کفاشان کاخکی
- Subjects
INFORMATION retrieval ,NONNEGATIVE matrices ,MATRIX decomposition ,SUPERVISED learning ,ACCESS to information ,K-nearest neighbor classification - Abstract
Improvement in information retrieval performance relates to the method of knowledge extraction from large amounts of text information on web. Text classification is a way of knowledge extraction with supervised machine learning methods. This paper proposed Kullback-Leibler divergence KNN for classifying extracted features based on term weighting with Latent Dirichlet Allocation algorithm. LDA is Non-Negative matrix factorization method proposed for topic modeling and dimension reduction of high dimensional feature space. In traditional LDA, each component value is assigned using the information retrieval Term Frequency measure. While this weighting method seems very appropriate for information retrieval, it is not clear that it is the best choice for text classification problems. Actually, this weighting method does not leverage the information implicitly contained in the categorization task to represent documents. In this paper, we introduce a new weighting method based on Point wise Mutual Information for accessing the importance of a word for a specific latent concept, then each document classified based on probability distribution over the latent topics. Experimental result investigated when we used Pointwise Mutual Information measure for term weighing and K Nearest Neighbor with Kullback-Leibler distance for classification, accuracy has been 82.5%, with the same accuracy versus probabilistic deep learning methods. [ABSTRACT FROM AUTHOR]
- Published
- 2019
48. 基于分布的中文词表示研究.
- Author
-
曹学飞, 李济洪, and 王瑞波
- Subjects
- *
MATRIX decomposition , *ARTIFICIAL neural networks , *TASK performance , *PROBLEM solving , *RESEMBLANCE (Philosophy) - Abstract
To solve the problem of parameters selection in the process of constructing the distributional representations of Chinese words, this paper performed a systematic study. It selected six kinds of parameters for comparison experiments, and evaluated the quality of the distributional representations of Chinese words obtained under different parameter settings on the Chinese semantic similarity task. The experimental results show that, by choosing appropriate parameters, the distributional representations of Chinese words can also get higher performance on the similarity task, moreover, the quality of such high-dimensional distributional representations is even superior to low-dimensional word representations based on neural network or matrix factorization. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
49. Multinomial Naïve Bayes using similarity based conditional probability.
- Author
-
Santhi, B. and Brindha, G.R.
- Subjects
- *
CONDITIONAL probability , *CONTENT mining , *SENTIMENT analysis , *EXPONENTIAL functions , *TEXT mining , *BAYES' theorem - Abstract
The exponential growth of Internet through sharing text content necessitates the analysis to convert them into useful information. The research areas such as Web mining, Opinion mining and Text mining focus on studies namely content mining, statistical analysis, prediction, and classification. Multinomial Naïve Bayes (MNB), the state of art of Bayesian classifier is the fastest and simplest text classifier. The objective of the proposed study is to enhance the classification by substituting the conditional probability of existing MNB with probability based frequency computation. A new combination that consists of Pointwise Mutual Information (PMI) and different normalized Term Frequency (TF) is used for computing the conditional probability. The new combinations provide weight to the words based on the information gain carried by the words related to the document that belongs to a class. The robustness of Similarity based Enhanced Conditional Probability MNB (SECP-MNB) is reflected in classification accuracy measurement. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
50. Hierarchical Organization of Collaboratively Constructed Content
- Author
-
Yu, Jianxing, Zha, Zheng-Jun, Chua, Tat-Seng, Gurevych, Iryna, editor, and Kim, Jungi, editor
- Published
- 2013
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.