33 results on '"Huang, Jimmy Xiangji"'
Search Results
2. Image Translation by Ad CycleGAN for COVID-19 X-Ray Images: A New Approach for Controllable GAN.
- Author
-
Liang, Zhaohui, Huang, Jimmy Xiangji, and Antani, Sameer
- Subjects
- *
X-ray imaging , *GENERATIVE adversarial networks , *ADAPTIVE control systems , *X-rays , *STANDARD deviations , *ARTIFICIAL intelligence - Abstract
We propose a new generative model named adaptive cycle-consistent generative adversarial network, or Ad CycleGAN to perform image translation between normal and COVID-19 positive chest X-ray images. An independent pre-trained criterion is added to the conventional Cycle GAN architecture to exert adaptive control on image translation. The performance of Ad CycleGAN is compared with the Cycle GAN without the external criterion. The quality of the synthetic images is evaluated by quantitative metrics including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Peak Signal-to-Noise Ratio (PSNR), Universal Image Quality Index (UIQI), visual information fidelity (VIF), Frechet Inception Distance (FID), and translation accuracy. The experimental results indicate that the synthetic images generated either by the Cycle GAN or by the Ad CycleGAN have lower MSE and RMSE, and higher scores in PSNR, UIQI, and VIF in homogenous image translation (i.e., Y → Y) compared to the heterogenous image translation process (i.e., X → Y). The synthetic images by Ad CycleGAN through the heterogeneous image translation have significantly higher FID score compared to Cycle GAN (p < 0.01). The image translation accuracy of Ad CycleGAN is higher than that of Cycle GAN when normal images are converted to COVID-19 positive images (p < 0.01). Therefore, we conclude that the Ad CycleGAN with the independent criterion can improve the accuracy of GAN image translation. The new architecture has more control on image synthesis and can help address the common class imbalance issue in machine learning methods and artificial intelligence applications with medical images. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
3. Bert-QAnet: BERT-encoded hierarchical question-answer cross-attention network for duplicate question detection.
- Author
-
Zhao, Xuan and Huang, Jimmy Xiangji
- Subjects
- *
COMMUNITIES , *POPULARITY , *INFORMATION sharing - Abstract
Community Question Answering (CQA) provides platforms for users with different backgrounds to share information and knowledge. With the increasing popularity of CQA, more and more question-answer (Q-A) pairs, with numerous duplicates, have accumulated. Therefore, many researchers focus on detecting duplicate questions in CQA. However, most existing techniques utilize only questions to solve the duplicate question detection task, while paired answers which may also contain necessary information are not considered. In this paper, we propose a BERT-encoded Hierarchical Question-Answer Cross-Attention Network for Duplicate Question Detection (Bert-QAnet) for detecting duplicate questions. Our model applies BERT to encode text and extract text features. Further, we use cross-attention to integrate word-level features both in question and answer. Also, inner attention is used to capture the interaction between question and answer. Hence, our model Bert-QAnet makes full use of semantic information in paired answers at both word-level and sentence-level. We evaluate our model on two datasets: the Yahoo! Answers dataset and the Stack Overflow dataset. To meet the special requirements of this study, both datasets are extended by paired answers. Experimental results demonstrate that our proposed model achieves state-of-the-art performance. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
4. A simple kernel co‐occurrence‐based enhancement for pseudo‐relevance feedback.
- Author
-
Pan, Min, Huang, Jimmy Xiangji, He, Tingting, Mao, Zhiming, Ying, Zhiwei, and Tu, Xinhui
- Subjects
- *
ALGORITHMS , *CONCEPTUAL structures , *DATABASE searching , *INFORMATION retrieval , *QUALITY assurance - Abstract
Pseudo‐relevance feedback is a well‐studied query expansion technique in which it is assumed that the top‐ranked documents in an initial set of retrieval results are relevant and expansion terms are then extracted from those documents. When selecting expansion terms, most traditional models do not simultaneously consider term frequency and the co‐occurrence relationships between candidate terms and query terms. Intuitively, however, a term that has a higher co‐occurrence with a query term is more likely to be related to the query topic. In this article, we propose a kernel co‐occurrence‐based framework to enhance retrieval performance by integrating term co‐occurrence information into the Rocchio model and a relevance language model (RM3). Specifically, a kernel co‐occurrence‐based Rocchio method (KRoc) and a kernel co‐occurrence‐based RM3 method (KRM3) are proposed. In our framework, co‐occurrence information is incorporated into both the factor of the term discrimination power and the factor of the within‐document term weight to boost retrieval performance. The results of a series of experiments show that our proposed methods significantly outperform the corresponding strong baselines over all data sets in terms of the mean average precision and over most data sets in terms of P@10. A direct comparison of standard Text Retrieval Conference data sets indicates that our proposed methods are at least comparable to state‐of‐the‐art approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
5. Boosting evolutionary optimization via fuzzy-classification-assisted selection.
- Author
-
Zhang, Jinyuan, Huang, Jimmy Xiangji, and Hu, Qinmin Vivian
- Subjects
- *
MEMBERSHIP functions (Fuzzy logic) , *EVOLUTIONARY algorithms - Abstract
• We treat the solution selection procedure in EAs as a fuzzy classification problem to reduce the number of FEs. Selected solutions belong to the 'promising' class, while discarded solutions belong to the 'unpromising' class • We propose a fuzzy-classification-assisted selection (FCAS) strategy to decide solutions for FE. Different from the existing classification-based strategies that decide solution evaluations according to the predicted labels, we use fuzzy membership degrees, which is more reliable than only using the labels. • The proposed FCAS strategy is a general algorithm framework, where different kinds of fuzzy classification models can be applied, and the FCAS can be applied to different kinds of EAs. We integrate FCAS into two state-of-the-art algorithms on three classical test suites. The experimental results show that the number of FEs can be significantly reduced by our proposed FCAS when the same fitness values are achieved. In evolutionary optimization, solution selection is an important operator since it will be normally used to decide the optimization direction via determining new solutions. Most selection methods are objective fitness-based approaches which will lead to a waste of fitness evaluations. This is because some evaluated but unpromising solutions are discarded without contributing useful search information. We are, thus, motivated to treat the solution selection as a classification procedure, where the selected solutions and discarded solutions belong to different classes. However, another problem is that the difference between 'promising' and 'unpromising' solutions becomes fuzzy when iterations go on. Therefore, we employ fuzzy classification to predict the categories of solutions by the fuzzy membership function. And then the predicted results are used to assist solution selection to reduce the number of fitness evaluations. Finally, we propose a fuzzy-classification-assisted selection (FCAS) strategy to boost evolutionary optimization. FCAS is experimentally integrated into two state-of-the-art algorithms and studied on three test suites. The results reveal the efficiency of FCAS for boosting evolutionary optimization. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
6. Mining authoritative and topical evidence from the blogosphere for improving opinion retrieval.
- Author
-
Huang, Jimmy Xiangji, He, Ben, and Zhao, Jiashu
- Subjects
- *
DATA mining , *BLOGS , *BLOGGERS , *SOCIAL networks , *SENTIMENT analysis , *INTERNET users - Abstract
Highlights • We mine and utilize authoritative and topical evidence for improving the retrieval performance of opinionated blog posts. • We build a profile for each blogger and estimate the probability of topical words extracted from the training queries. • Further, a novel document-based neural matching mode is proposed to include different sources of information. • Our proposed approach does not use additional resources to extract opinion terms, which provides a new and promising avenue. Abstract The rise of the Internet blogging has created a highly dynamic Web society that involves bloggers’ views and opinions in response to real-world events. As an emerging research field, the blog post opinion retrieval requires finding not only relevant but also opinionated blog posts. Most of the current solutions are based on a dictionary of sentiment words for identifying subjective features from blog posts. In this paper, we propose to utilize novel evidence, namely the authoritative and topical evidence, for mining opinions from the blogosphere. We suggest that bloggers interested in controversial topics tend to express opinions in their posts, and therefore, it is beneficial to boost the ranking of blog posts written by such authors. We further improve our approach by extending with different sources of features, which is incorporated into a document-based neural matching model. Our experiments on the standard test data from the TREC 2006–2008 Blog track opinion finding task show that the proposed approach is capable of achieving remarkable improvements over strong baselines. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
7. geNov: A new metric for measuring novelty and relevancy in biomedical information retrieval.
- Author
-
An, Xiangdong and Huang, Jimmy Xiangji
- Subjects
- *
INFORMATION retrieval , *ALGORITHMS , *EXPERIMENTAL design , *MEDICAL literature , *GENOMICS - Abstract
For diversity and novelty evaluation in information retrieval, we expect that the novel documents are always ranked higher than the redundant ones and the relevant ones higher than the irrelevant ones. We also expect that the level of novelty and relevancy should be acknowledged. Accordingly, we expect that the evaluation algorithm would reward rankings that respect these expectations. Nevertheless, there are few research articles in the literature that study how to meet such expectations, even fewer in the field of biomedical information retrieval. In this article, we propose a new metric for novelty and relevancy evaluation in biomedical information retrieval based on an aspect-level performance measure introduced by TREC Genomics Track with formal results to show that those expectations above can be respected under ideal conditions. The empirical evaluation indicates that the proposed metric, geNov, is greatly sensitive to the desired characteristics above, and the three parameters are highly tuneable for different evaluation preferences. By experimentally comparing with state-of-the-art metrics for novelty and diversity, the proposed metric shows its advantages in recognizing the ranking quality in terms of novelty, redundancy, relevancy, and irrelevancy and in its discriminative power. Experiments reveal the proposed metric is faster to compute than state-of-the-art metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
8. Modeling and Learning Distributed Word Representation with Metadata for Question Retrieval.
- Author
-
Zhou, Guangyou and Huang, Jimmy Xiangji
- Subjects
- *
METADATA , *FACILITATED learning , *INTERROGATIVE (Grammar) , *DOCUMENT type definitions , *ARCS Model of Motivational Design - Abstract
Community question answering (cQA) has become an important issue due to the popularity of cQA archives on the Web. This paper focuses on addressing the lexical gap problem in question retrieval. Question retrieval in cQA archives aims to find the existing questions that are semantically equivalent or relevant to the queried questions. However, the lexical gap problem brings a new challenge for question retrieval in cQA. In this paper, we propose to model and learn distributed word representations with metadata of category information within cQA pages for question retrieval using two novel category powered models. One is a basic category powered model called MB-NET and the other one is an enhanced category powered model called ME-NET which can better learn the distributed word representations and alleviate the lexical gap problem. To deal with the variable size of word representation vectors, we employ the framework of fisher kernel to transform them into the fixed-length vectors. Experimental results on large-scale English and Chinese cQA data sets show that our proposed approaches can significantly outperform state-of-the-art retrieval models for question retrieval in cQA. Moreover, we further conduct our approaches on large-scale automatic evaluation experiments. The evaluation results show that promising and significant performance improvements can be achieved. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
9. BRBcast: A new approach to belief rule-based system parameter learning via extended causal strength logic.
- Author
-
Sun, Jian-Bin, Huang, Jimmy Xiangji, Chang, Lei-Lei, Jiang, Jiang, and Tan, Yue-Jin
- Subjects
- *
RULE-based programming , *MACHINE learning , *PARAMETER estimation , *APPROXIMATION theory , *EMERGENCY management - Abstract
The belief rule-based (BRB) system has demonstrated advantages in complex system modeling and evaluation, with strong nonlinear relationship approximation capabilities. BRB parameter learning processes have been proved to be effective in improving the approximation accuracy of BRB systems. However, the running time complexity is regarded as an important challenge in BRB parameter learning efficiency. In this paper, a new approach to BRB parameter learning via extended causal strength (CAST) logic (BRBcast) is proposed in order to reduce the complexity of BRB parameter learning and maintain the approximation accuracy of BRB systems. First, the parameter numbers of traditional BRB parameter learning are analyzed to show the necessity of complexity reduction. Furthermore, the binary CAST logic is extended to fulfill the requirements of multi-state modeling and evaluation. Thereafter, an optimization model for parameter learning with CAST logic is established based on the analysis conclusion, and further applied to reduce the BRB parameter learning complexity. In BRBcast, the CAST parameters, instead of BRB parameters, are trained and translated to construct belief rule bases in BRB parameter learning, which involves less parameters than those of traditional BRB parameter learning approaches. Following this, the detailed BRBcast procedure is presented with the differential evolutionary (DE) algorithm. Finally, a numerical case and practical example on pipeline leak detection are investigated in order to verify the efficiency of BRBcast. The experimental results indicate that the proposed BRBcast exhibits superior performance, in both reducing the BRB parameter learning complexity and ensuring the approximation accuracy of BRB systems, which provides a promising avenue for constructing accurate and robust disaster emergency and rapid response systems. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
10. A learning to rank approach for quality-aware pseudo-relevance feedback.
- Author
-
Ye, Zheng and Huang, Jimmy Xiangji
- Subjects
- *
ALGORITHMS , *INFORMATION retrieval , *QUALITY assurance , *RESEARCH funding - Abstract
Pseudo relevance feedback ( PRF) has shown to be effective in ad hoc information retrieval. In traditional PRF methods, top-ranked documents are all assumed to be relevant and therefore treated equally in the feedback process. However, the performance gain brought by each document is different as showed in our preliminary experiments. Thus, it is more reasonable to predict the performance gain brought by each candidate feedback document in the process of PRF. We define the quality level ( QL) and then use this information to adjust the weights of feedback terms in these documents. Unlike previous work, we do not make any explicit relevance assumption and we go beyond just selecting 'good' documents for PRF. We propose a quality-based PRF framework, in which two quality-based assumptions are introduced. Particularly, two different strategies, relevance-based QL ( RelPRF) and improvement-based QL ( ImpPRF) are presented to estimate the QL of each feedback document. Based on this, we select a set of heterogeneous document-level features and apply a learning approach to evaluate the QL of each feedback document. Extensive experiments on standard TREC (Text REtrieval Conference) test collections show that our proposed model performs robustly and outperforms strong baselines significantly. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
11. STCA: Utilizing a spatio-temporal cross-attention network for enhancing video person re-identification.
- Author
-
Bhuiyan, Amran and Huang, Jimmy Xiangji
- Subjects
- *
DEEP learning , *VIDEO coding , *VIDEOS - Abstract
Video-based re-identification (ReID) is a crucial task in computer vision that draws increasing attention due to advances in deep learning (DL) and modern computational devices. Despite recent success with CNN architectures, single models (e.g., 2D-CNNs or 3D-CNNs) alone failed to leverage temporal information with spatial cues. This is due to uncontrolled surveillance scenarios and variable poses leading to inevitable misalignment of ROIs across the tracklets, which is accompanied by occlusion and motion blur. In this context, designing temporal and spatial cues for two different models and their combinations can be beneficial, considering the global of a video-tracklet. 3D-CNNs allow encoding of temporal information while 2D-CNNs extract spatial or appearance information. In this paper, we propose a Spatio-Temporal Cross Attention (STCA) network to utilize both 2D-CNNs and 3D-CNNs that calculate the cross attention mapping both from the layer of 3D-CNNs and 2D-CNNs along a person's trajectory to gate the following layers of 2D-CNNs; and highlight relevant appearance features for the person ReID. Given an input tracklet, the proposed cross attention (CA) is able to capture the salient regions that propagate throughout the tracklet to obtain the global view. This provides a spatio-temporal attention approach that can be dynamically aggregated with spatial features of 2D-CNNs to perform finer-grained recognition. Additionally, we exploit the advantage of utilizing cosine similarity while triplet sampling as well as for calculating the final recognition score. Experimental analyses on three challenging benchmark datasets indicate that integrating spatio-temporal cross attention into the state-of-the-art video ReID backbone CNN architecture allows for improving their recognition accuracy. [Display omitted] • We propose a Spatio Temporal Cross Attention (STCA) network to generate cross guided attention for video re-identification. • The proposed STCA adopts both the 2D and 3D-CNNs to capture the common salient features consistent throughout space and time. • The generated attention is used for gating 2D-CNN that enhances its mean of fine-grained recognition to address misalignment. • Optimizing STCA using cosine distance for hard triplet mining leads to faster convergence and better recognition accuracy [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
12. Mining query-driven contexts for geographic and temporal search.
- Author
-
Daoud, Mariam and Huang, Jimmy Xiangji
- Subjects
- *
TEMPORAL databases , *MINES & mineral resources , *MINERAL industries , *PROBABILISTIC number theory - Abstract
The explosive growth of geographic and temporal data has attracted much attention in information retrieval (IR) field. Since geographic and temporal information are often available in unstructured text, the IR task becomes a non-straightforward process. In this article, we propose a novel geo-temporal context mining approach and a geo-temporal ranking model for improving the search performance. Queries target implicitly ‘what’, ‘when’ and ‘where’ components. We model geographic and temporal query-dependent frequent patterns, called contexts. These contexts are derived based on extracting and ranking geographic and temporal entities found in pseudo-relevance feedback documents. Two methods are proposed for inferring the query-dependent contexts: (1) a frequency-based statistical approach and (2) a frequent pattern mining approach using a support threshold. The derived geographic and temporal query contexts are then exploited into a probabilistic ranking model. Finally, geographic, temporal and content-based scores are combined together for improving the geo-temporal search performance. We evaluate our approach on the New York Times news collection. The experimental results show that our proposed approach outperforms significantly a well-known baseline search, namely the probabilistic BM25 ranking model and state-of-the-art approaches in the field as well. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
13. A Survival Modeling Approach to Biomedical Search Result Diversification Using Wikipedia.
- Author
-
Xiaoshi Yin, Huang, Jimmy Xiangji, Zhoujun Li, and Xiaofeng Zhou
- Subjects
- *
INFORMATION retrieval , *BIOMEDICAL engineering , *QUERY languages (Computer science) , *INFORMATION filtering - Abstract
In this paper, we propose a survival modeling approach to promoting ranking diversity for biomedical information retrieval. The proposed approach concerns with finding relevant documents that can deliver more different aspects of a query. First, two probabilistic models derived from the survival analysis theory are proposed for measuring aspect novelty. Second, a new method using Wikipedia to detect aspects covered by retrieved documents is presented. Third, an aspect filter based on a two-stage model is introduced. It ranks the detected aspects in decreasing order of the probability that an aspect is generated by the query. Finally, the relevance and the novelty of retrieved documents are combined at the aspect level for reranking. Experiments conducted on the TREC 2006 and 2007 Genomics collections demonstrate the effectiveness of the proposed approach in promoting ranking diversity for biomedical information retrieval. Moreover, we further evaluate our approach in the Web retrieval environment. The evaluation results on the ClueWeb09-T09B collection show that our approach can achieve promising performance improvements. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
14. Modeling geographic, temporal, and proximity contexts for improving geotemporal search.
- Author
-
Daoud, Mariam and Huang, Jimmy Xiangji
- Subjects
- *
INFORMATION retrieval , *COMPUTER software , *NEWSPAPERS , *RESEARCH funding , *TIME , *SEARCH engines - Abstract
Traditional information retrieval ( IR) systems show significant limitations on returning relevant documents that satisfy the user's information needs. In particular, to answer geographic and temporal user queries, the IR task becomes a nonstraightforward process where the available geographic and temporal information is often unstructured. In this article, we propose a geotemporal search approach that consists of modeling and exploiting geographic and temporal query context evidence that refers to implicit multivarying geographic and temporal intents behind the query. Modeling geographic and temporal query contexts is based on extracting and ranking geographic and temporal keywords found in pseudo-relevant feedback ( PRF) documents for a given query. Our geotemporal search approach is based on exploiting the geographic and temporal query contexts separately into a probabilistic ranking model and jointly into a proximity ranking model. Our hypothesis is based on the concept that geographic and temporal expressions tend to co-occur within the document where the closer they are in the document, the more relevant the document is. Finally, geographic, temporal, and proximity scores are combined according to a linear combination formula. An extensive experimental evaluation conducted on a portion of the New York Times news collection and the TREC 2004 robust retrieval track collection shows that our geotemporal approach outperforms significantly a well-known baseline search and the best known geotemporal search approaches in the domain. Finally, an in-depth analysis shows a positive correlation between the geographic and temporal query sensitivity and the retrieval performance. Also, we find that geotemporal distance has a positive impact on retrieval performance generally. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
15. Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval.
- Author
-
Ye, Zheng, Huang, Jimmy Xiangji, He, Ben, and Lin, Hongfei
- Subjects
- *
INFORMATION retrieval , *INTERNET , *LANGUAGE & languages , *RESEARCH funding , *TRANSLATIONS , *INFORMATION resources , *REFERENCE sources - Abstract
Wikipedia is characterized by its dense link structure and a large number of articles in different languages, which make it a notable Web corpus for knowledge extraction and mining, in particular for mining the multilingual associations. In this paper, motivated by a psychological theory of word meaning, we propose a graph-based approach to constructing a cross-language association dictionary ( CLAD) from Wikipedia, which can be used in a variety of cross-language accessing and processing applications. In order to evaluate the quality of the mined CLAD, and to demonstrate how the mined CLAD can be used in practice, we explore two different applications of the mined CLAD to cross-language information retrieval ( CLIR). First, we use the mined CLAD to conduct cross-language query expansion; and, second, we use it to filter out translation candidates with low translation probabilities. Experimental results on a variety of standard CLIR test collections show that the CLIR retrieval performance can be substantially improved with the above two applications of CLAD, which indicates that the mined CLAD is of sound quality. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
16. Incorporating rich features to boost information retrieval performance: A SVM-regression based re-ranking approach
- Author
-
Ye, Zheng, Huang, Jimmy Xiangji, and Lin, Hongfei
- Subjects
- *
INFORMATION retrieval , *STATISTICAL weighting , *RANKING , *HEURISTIC algorithms , *SUPPORT vector machines , *REGRESSION analysis , *QUERY (Information retrieval system) , *NATURAL language processing - Abstract
Abstract: Document ranking is an essential problem in the field of information retrieval (IR). Traditional weighting models such as BM25 and Language model can only take advantage of query terms. IR is a complex process that may be affected by a series of heterogeneous features. It is necessary to refine first-pass retrieval results by taking rich features into account. Traditional heuristic re-ranking approaches can only take advantage of a small number of homogeneous features that may affect information retrieval performance. In this paper, we propose and evaluate a regression-based document re-ranking approach for IR, in which we use SVM regression model to learn a re-ranking function automatically. Under this regression-based framework, we can take advantage of rich features to re-rank the firs-pass retrieved documents by traditional weighting models. We conduct a series of experiments on four standard IR collections in two different languages. The experimental results show that our proposed approach can significantly improve the retrieval performance over the first-pass retrieval. Moreover, by refining the first-pass retrieved document set, the traditional pseudo relevant feedback approaches can also be enhanced. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
17. Modeling term proximity for probabilistic information retrieval models
- Author
-
He, Ben, Huang, Jimmy Xiangji, and Zhou, Xiaofeng
- Subjects
- *
INFORMATION storage & retrieval systems , *COMBINATORIAL probabilities , *PROBABILISTIC number theory , *POISSON processes , *QUERY (Information retrieval system) , *MARKOV random fields , *DATA distribution , *NETWORK PC (Computer) - Abstract
Abstract: Proximity among query terms has been found to be useful for improving retrieval performance. However, its application to classical probabilistic information retrieval models, such as Okapi’s BM25, remains a challenging research problem. In this paper, we propose to improve the classical BM25 model by utilizing the term proximity evidence. Four novel methods, namely a window-based N-gram Counting method, Survival Analysis over different statistics, including the Poisson process, an exponential distribution and an empirical function, are proposed to model the proximity between query terms. Through extensive experiments on standard TREC collections, our proposed proximity-based BM25 model, called BM25P, is compared to strong state-of-the-art evaluation baselines, including the original unigram BM25 model, the Markov Random Field model, and the positional language model. According to the experimental results, the window-based N-gram Counting method, and Survival Analysis over an exponential distribution are the most effective among all four proposed methods, which lead to marked improvement over the baselines. This shows that the use of term proximity considerably enhances the retrieval effectiveness of the classical probabilistic models. It is therefore recommended to deploy a term proximity component in retrieval systems that employ probabilistic models. [Copyright &y& Elsevier]
- Published
- 2011
- Full Text
- View/download PDF
18. Domain Adaptation with Pre-trained Transformers for Query-Focused Abstractive Text Summarization.
- Author
-
Laskar, Md Tahmid Rahman, Hoque, Enamul, and Huang, Jimmy Xiangji
- Subjects
- *
NATURAL language processing , *SUPERVISED learning , *TEXT summarization - Abstract
The Query-Focused Text Summarization (QFTS) task aims at building systems that generate the summary of the text document(s) based on the given query. A key challenge in addressing this task is the lack of large labeled data for training the summarization model. In this article, we address this challenge by exploring a series of domain adaptation techniques. Given the recent success of pre-trained transformer models in a wide range of natural language processing tasks, we utilize such models to generate abstractive summaries for the QFTS task for both single-document and multi-document scenarios. For domain adaptation, we apply a variety of techniques using pre-trained transformer-based summarization models including transfer learning, weakly supervised learning, and distant supervision. Extensive experiments on six datasets show that our proposed approach is very effective in generating abstractive summaries for the QFTS task while setting a new state-of-the-art result in several datasets across a set of automatic and human evaluation metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
19. SK-GCN: Modeling Syntax and Knowledge via Graph Convolutional Network for aspect-level sentiment classification.
- Author
-
Zhou, Jie, Huang, Jimmy Xiangji, Hu, Qinmin Vivian, and He, Liang
- Subjects
- *
CONVOLUTIONAL neural networks , *SENTIMENT analysis , *TREE graphs , *CLASSIFICATION , *MULTICASTING (Computer networks) , *USER-generated content - Abstract
Aspect-level sentiment classification is a fundamental subtask of fine-grained sentiment analysis. The syntactic information and commonsense knowledge are important and useful for aspect-level sentiment classification, while only a limited number of studies have explored to incorporate them via flexible graph convolutional neural networks (GCN) for this task. In this paper, we propose a new Syntax- and Knowledge-based Graph Convolutional Network (SK-GCN) model for aspect-level sentiment classification, which leverages the syntactic dependency tree and commonsense knowledge via GCN. In particular, to enhance the representation of the sentence toward the given aspect, we develop two strategies to model the syntactic dependency tree and commonsense knowledge graph, namely SK-GCN 1 and SK-GCN 2 respectively. SK-GCN 1 models the dependency tree and knowledge graph via Syntax-based GCN (S-GCN) and Knowledge-based GCN (K-GCN) independently, and SK-GCN 2 models them jointly. We also apply pre-trained BERT to this task and obtain new state-of-the-art results. Extensive experiments on five benchmark datasets demonstrate that our approach can effectively improve the performance of aspect-level sentiment classification compared with the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
20. MASAD: A large-scale dataset for multimodal aspect-based sentiment analysis.
- Author
-
Zhou, Jie, Zhao, Jiabao, Huang, Jimmy Xiangji, Hu, Qinmin Vivian, and He, Liang
- Subjects
- *
SENTIMENT analysis , *USER-generated content , *DEEP learning - Abstract
Aspect-based sentiment analysis has obtained great success in recent years. Most of the existing work focuses on determining the sentiment polarity of the given aspect according to the given text, while little attention has been paid to the visual information as well as multimodality content for aspect-based sentiment analysis. Multimodal content is becoming increasingly popular in mainstream online social platforms and can help better extract user sentiments toward a given aspect. There are only few studies focusing on this new task: Multimodal Aspect-based Sentiment Analysis (MASA), which performs aspect-based sentiment analysis by integrating both texts and images. In this paper, we propose a mutimodal interaction model for MASA to learn the relationship among the text, image and aspect via interaction layers and adversarial training. Additionally, we build a new large-scale dataset for this task, named MASAD , which involves seven domains and 57 aspect categories with 38 k image-text pairs. Extensive experiments have been conducted on the proposed dataset to provide several baselines for this task. Though our models obtain significant improvement for this task, empirical results show that MASA is more challenging than textual aspect-based sentiment analysis, which indicates that MASA remains a challenging open problem and requires further efforts. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
21. Position-aware hierarchical transfer model for aspect-level sentiment classification.
- Author
-
Zhou, Jie, Chen, Qin, Huang, Jimmy Xiangji, Hu, Qinmin Vivian, and He, Liang
- Subjects
- *
KNOWLEDGE transfer , *CLASSIFICATION , *INFORMATION modeling - Abstract
Recently, attention-based neural networks (NNs) have been widely used for aspect-level sentiment classification (ASC). Most neural models focus on incorporating the aspect representation into attention, however, the position information of each aspect is not studied well. Furthermore, the existing ASC datasets are relatively small owing to the labor-intensive labeling that largely limits the performance of NNs. In this paper, we propose a position-aware hierarchical transfer (PAHT) model that models the position information from multiple levels and enhances the ASC performance by transferring hierarchical knowledge from the resource-rich sentence-level sentiment classification (SSC) dataset. We first present aspect-based positional attention in the word and the segment levels to capture more salient information toward a given aspect. To make up for the limited data for ASC, we devise three sampling strategies to select related instances from the large-scale SSC dataset for pre-training and transfer the learned knowledge into ASC from four levels: embedding, word, segment and classifier. Extensive experiments on four benchmark datasets demonstrate that the proposed model is effective in improving the performance of ASC. Particularly, our model outperforms the state-of-the-art approaches in terms of accuracy over all the datasets considered. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
22. TAKer: Fine-Grained Time-Aware Microblog Search with Kernel Density Estimation.
- Author
-
Chen, Qin, Hu, Qinmin, Huang, Jimmy Xiangji, and He, Liang
- Subjects
- *
KERNEL functions , *MICROBLOGS , *INFORMATION retrieval , *SEARCH algorithms , *FEEDBACK control systems - Abstract
Temporal information has been widely used to promote the information retrieval (IR) performance, especially for microblog search which usually prefers the latest news and events. Previous studies mainly focused on incorporating the document-level temporal information into retrieval, while the temporal relevance of each query word was not well investigated. In this paper, we propose a word temporal predictor to characterize the word-level temporal relevance by fine-grained time-aware kernel density estimation over the feedback documents. In addition, we present a fine-grained time-aware framework to integrate the proposed word temporal predictor with the traditional document temporal predictor for retrieval. Finally, we incorporate the framework into two state-of-the-art retrieval models, namely language model (LM) and BM25. The experimental results on the TREC 2011-2014 Microblog collections, show that our proposed word temporal predictor is effective to boost the retrieval performance within both LM and BM25 frameworks. In particular, we achieve significant improvements over the strong baselines with optimized settings in most cases. Furthermore, our fine-grained time-aware models with word temporal predictor are comparable to if not better than the state-of-the-art temporal retrieval models. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
23. Modeling Queries with Contextual Snippets for Information Retrieval.
- Author
-
Chen, Qin, Hu, Qinmin, Huang, Jimmy Xiangji, and He, Liang
- Subjects
- *
DATA distribution , *EXPERIMENTAL programs , *FINANCIAL statistics , *INFORMATION retrieval , *DOCUMENTATION - Abstract
Query expansion under the pseudo-relevance feedback (PRF) framework has been extensively studied in information retrieval. However, most expansion methods are mainly based on the statistics of single terms, which can generate plenty of irrelevant query terms and decrease retrieval performance. To alleviate this problem, we propose an approach that adapts the PRF-based contextual snippets into a context-aware topic model to enhance query representations. Specifically, instead of selecting a series of independent terms, we make full use of the query contextual information and focus on the snippets with the length of n in the PRF documents. Furthermore, we propose a context-aware topic (CAT) model to mine the topic distributions of the query-relevant snippets, namely, fine contextual snippets. In contrast to the traditional topic models that infer the topics from the whole corpus, we establish a bridge between the snippets and the corresponding PRF documents, which can be used for modeling the topics more precisely and efficiently. Finally, the topic distributions of the fine snippets are used for context-aware and topic-sensitive query representations. To evaluate the performance of our approach, we integrate the obtained queries into a topic-based hybrid retrieval model and conduct extensive experiments on various TREC collections. The experimental results show that our query-modeling approach is more effective in boosting retrieval performance compared with the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
24. An ontological account of flow-control components in BPMN process models.
- Author
-
Tan, Xing, Gu, Yilan, and Huang, Jimmy Xiangji
- Subjects
- *
BUSINESS process management , *SEMANTICS , *FIRST-order logic , *AUTOMATIC theorem proving , *PETRI nets - Abstract
The Business Process Model and Notation (BPMN) has been widely adopted in the recent years as one of the standard languages for visual description of business processes. BPMN however does not include a formal semantics, which is required for formal verification and validation of behaviors of BPMN models.Towards bridging this gap using first-order logic, we in this paper present an ontological/formal account of flow-control components in BPMN, using Situation Calculus and Petri nets. More precisely, we use SCOPE (Situation Calculus Ontology of PEtri nets), developed from our previous work, to formally describe flow-control related basic components (i.e., events, tasks, and gateways) in BPMN as SCOPE-based procedures. These components are first mapped from BPMN onto Petri nets.Our approach differs from other major approaches for assigning semantics to BPMN (e.g., the ones applying communicating sequential processes, or abstract state machines) in the following aspects. Firstly, the approach supports direct application of automated theorem proving for checking theory consistency or verifying dynamical properties of systems. Secondly, it defines concepts through aggregation of more basic concepts in a hierarchical way thus the adoptability and extensibility of the models are presumably high. Thirdly, Petri-net-based implementation is completely encapsulated such that interfaces between the system and its users are defined completely within a BPMN context. Finally, the approach can easily further adopt the concept of time. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
25. An ontological account of flow-control components in BPMN process models.
- Author
-
Tan, Xing, Gu, Yilan, and Huang, Jimmy Xiangji
- Subjects
- *
BUSINESS process management , *PROCESS control systems , *PETRI nets , *GRAPH theory , *FIRST-order logic - Abstract
The Business Process Model and Notation (BPMN) has been widely adopted in the recent years as one of the standard languages for visual description of business processes. BPMN however does not include a formal semantics, which is required for formal verification and validation of behaviors of BPMN models.Towards bridging this gap using first-order logic, we in this paper present an ontological/formal account of flow-control components in BPMN, using Situation Calculus and Petri nets. More precisely, we use SCOPE (Situation Calculus Ontology of PEtri nets), developed from our previous work, to formally describe flow-control related basic components (i.e., events, tasks, and gateways) in BPMN as SCOPE-based procedures. These components are first mapped from BPMN onto Petri nets.Our approach differs from other major approaches for assigning semantics to BPMN (e.g., the ones applying communicating sequential processes, or abstract state machines) in the following aspects. Firstly, the approach supports direct application of automated theorem proving for checking theory consistency or verifying dynamical properties of systems. Secondly, it defines concepts through aggregation of more basic concepts in a hierarchical way thus the adoptability and extensibility of the models are presumably high. Thirdly, Petri-net-based implementation is completely encapsulated such that interfaces between the system and its users are defined completely within a BPMN context. Finally, the approach can easily further adopt the concept of time. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
26. GFCNet: Utilizing graph feature collection networks for coronavirus knowledge graph embeddings.
- Author
-
Xie, Zhiwen, Zhu, Runjie, Liu, Jin, Zhou, Guangyou, Huang, Jimmy Xiangji, and Cui, Xiaohui
- Subjects
- *
KNOWLEDGE graphs , *ARTIFICIAL intelligence , *COVID-19 pandemic , *COVID-19 , *MACHINE learning - Abstract
In response to fighting COVID-19 pandemic, researchers in machine learning and artificial intelligence have constructed some medical knowledge graphs (KG) based on existing COVID-19 datasets, however, these KGs contain a considerable amount of semantic relations which are incomplete or missing. In this paper, we focus on the task of knowledge graph embedding (KGE), which serves an important solution to infer the missing relations. In the past, there have been a collection of knowledge graph embedding models with different scoring functions to learn entity and relation embeddings published. However, these models share the same problems of rarely taking important features of KG like attribute features, other than relation triples, into account, while dealing with the heterogeneous, complex and incomplete COVID-19 medical data. To address the above issue, we propose a graph feature collection network (GFCNet) for COVID-19 KGE task, which considers both neighbor and attribute features in KGs. The extensive experiments conducted on the COVID-19 drug KG dataset show promising results and prove the effectiveness and efficiency of our proposed model. In addition, we also explain the future directions of deepening the study on COVID-19 KGE task. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
27. Incorporating global–local neighbors with Gaussian mixture embedding for few-shot knowledge graph completion.
- Author
-
Xie, Penghui, Zhou, Guangyou, Liu, Jin, and Huang, Jimmy Xiangji
- Subjects
- *
KNOWLEDGE graphs , *GAUSSIAN mixture models , *SHOT peening - Abstract
Few-shot knowledge graph completion (FKGC) aims to predict the missing parts of the query triplet based on a small number of known samples. To solve the above task, many existing approaches enhance entity embedding by encoding local neighbor information and obtain few-shot relational representations by encoding support triples. Although these previous studies have achieved promising results, they still suffer from the following two challenges: (1) Remote neighbor contains rich semantic information, how to effectively encode remote neighbor information is the first challenge? (2) Low-frequency relations and complex relations in the knowledge graph lead to uncertainty in the semantics of the relation, how to effectively model the uncertainty of the few-shot relation is the second challenge? For the former issue, we propose a global–local neighbor encoding module, where global encoder captures remote neighbor features based on relation paths and local encoder uses the task-aware attention mechanism to capture local neighbor features. For the latter issue, we employee the adaptive gaussian mixture model to model few-shot relation, which can adapt to different queries by dynamically adjusting component weights. Link prediction experiments are conducted on two benchmark datasets NELL-One and Wiki-One, and the proposed model achieved 14.0% and 7.8% improvement in the evaluation metric Hits@1 respectively, compared to the strong baseline model FAAN. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
28. DFM: A parameter-shared deep fused model for knowledge base question answering.
- Author
-
Zhou, Guangyou, Xie, Zhiwen, Yu, Zongfu, and Huang, Jimmy Xiangji
- Subjects
- *
KNOWLEDGE base , *NATURAL language processing , *MULTILEVEL models - Abstract
Currently, Knowledge Base Question Answering (KBQA) is an important research topic in the fields of information retrieval (IR) and natural language processing (NLP). The most common questions asked on the Web are simple questions, which can be answered by a single relational fact in a knowledge base (KB). However, answering simple questions automatically remains a challenging task in the IR and NLP research communities. Based on a review of various studies and a detailed analysis, we surmise that these challenges are primarily related to the following concerns: (1) how to effectively access a large-scale KB; and (2) how to effectively reduce the gap between NL questions and the structured semantics in a KB. Most previous studies have considered these as separate and independent subtasks, subject detection and predicate matching. Here, we propose a deep fused model that combines subject detection and predicate matching under a unified framework. Specifically, we employ a subject detection model to recognize the subject entity in a question, and a multilevel semantic model to learn the semantic representations for questions and predicates. These models share parameters, and can be trained jointly. We evaluated the proposed method on both English and Chinese KBQA datasets. The experimental results demonstrate that the proposed approach significantly outperforms state-of-the-art systems when applied to both datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
29. SPRF: A semantic Pseudo-relevance Feedback enhancement for information retrieval via ConceptNet.
- Author
-
Pan, Min, Pei, Quanli, Liu, Yu, Li, Teng, Huang, Ellen Anne, Wang, Junmei, and Huang, Jimmy Xiangji
- Subjects
- *
INFORMATION retrieval , *PSYCHOLOGICAL feedback - Abstract
Pseudo-relevance feedback is a widely acclaimed technique for information retrieval. However, traditional information retrieval approaches typically process the original query into individual terms, often overlooking the consideration of the semantic information in the query term itself, by instead focusing on features such as term frequency and inverse document frequency. In this paper, we propose a new semantic-based Pseudo-relevance Feedback model (SPRF) based on the PRF framework. Our SPRF model leverages ConceptNet to provide comprehensive semantic information between terms. It not only considers the query's importance in the collection but also integrates its semantic information into the PRF framework to enhance the selection of query expansion terms, leading to more precise feedback documents for users. A series of experimental results show that our proposed SPRF model is feasible. Our model achieves good performance in terms of the MAP, P@10, NDCG and MRR metrics and demonstrates advantages over the baseline models, the state-of-the-art models and several neural network-based methods. By comparing and analyzing the methods in a sample case, the extended terms resulting from the proposed model are shown to be in better semantic agreement with the given query. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
30. Deep generative learning for automated EHR diagnosis of traditional Chinese medicine.
- Author
-
Liang, Zhaohui, Liu, Jun, Ou, Aihua, Zhang, Honglai, Li, Ziping, and Huang, Jimmy Xiangji
- Subjects
- *
DEEP learning , *CHINESE medicine , *INFORMATION storage & retrieval systems , *SUPERVISED learning , *SUPPORT vector machines , *ESSENTIAL hypertension - Abstract
Computer-aided medical decision-making (CAMDM) is the method to utilize massive EMR data as both empirical and evidence support for the decision procedure of healthcare activities. Well-developed information infrastructure, such as hospital information systems and disease surveillance systems, provides abundant data for CAMDM. However, the complexity of EMR data with abstract medical knowledge makes the conventional model incompetent for the analysis. Thus a deep belief networks (DBN) based model is proposed to simulate the information analysis and decision-making procedure in medical practice. The purpose of this paper is to evaluate a deep learning architecture as an effective solution for CAMDM. A two-step model is applied in our study. At the first step, an optimized seven-layer deep belief network (DBN) is applied as an unsupervised learning algorithm to perform model training to acquire feature representation. Then a support vector machine model is adopted to DBN at the second step of the supervised learning. There are two data sets used in the experiments. One is a plain text data set indexed by medical experts. The other is a structured dataset on primary hypertension. The data are randomly divided to generate the training set for the unsupervised learning and the testing set for the supervised learning. The model performance is evaluated by the statistics of mean and variance, the average precision and coverage on the data sets. Two conventional shallow models (support vector machine / SVM and decision tree / DT) are applied as the comparisons to show the superiority of our proposed approach. The deep learning (DBN + SVM) model outperforms simple SVM and DT on two data sets in terms of all the evaluation measures, which confirms our motivation that the deep model is good at capturing the key features with less dependence when the index is built up by manpower. Our study shows the two-step deep learning model achieves high performance for medical information retrieval over the conventional shallow models. It is able to capture the features of both plain text and the highly-structured database of EMR data. The performance of the deep model is superior to the conventional shallow learning models such as SVM and DT. It is an appropriate knowledge-learning model for information retrieval of EMR system. Therefore, deep learning provides a good solution to improve the performance of CAMDM systems. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
31. MF‐Re‐Rank: A modality feature‐based Re‐Ranking model for medical image retrieval.
- Author
-
Ayadi, Hajer, Torjmen‐Khemakhem, Mouna, Daoud, Mariam, Huang, Jimmy Xiangji, and Ben Jemaa, Maher
- Subjects
- *
DIAGNOSTIC imaging , *DIGITAL diagnostic imaging , *EXPERIMENTAL design , *INFORMATION retrieval , *METADATA , *RESEARCH funding - Abstract
One of the main challenges in medical image retrieval is the increasing volume of image data, which render it difficult for domain experts to find relevant information from large data sets. Effective and efficient medical image retrieval systems are required to better manage medical image information. Text‐based image retrieval (TBIR) was very successful in retrieving images with textual descriptions. Several TBIR approaches rely on models based on bag‐of‐words approaches, in which the image retrieval problem turns into one of standard text‐based information retrieval; where the meanings and values of specific medical entities in the text and metadata are ignored in the image representation and retrieval process. However, we believe that TBIR should extract specific medical entities and terms and then exploit these elements to achieve better image retrieval results. Therefore, we propose a novel reranking method based on medical‐image‐dependent features. These features are manually selected by a medical expert from imaging modalities and medical terminology. First, we represent queries and images using only medical‐image‐dependent features such as image modality and image scale. Second, we exploit the defined features in a new reranking method for medical image retrieval. Our motivation is the large influence of image modality in medical image retrieval and its impact on image‐relevance scores. To evaluate our approach, we performed a series of experiments on the medical ImageCLEF data sets from 2009 to 2013. The BM25 model, a language model, and an image‐relevance feedback model are used as baselines to evaluate our approach. The experimental results show that compared to the BM25 model, the proposed model significantly enhances image retrieval performance. We also compared our approach with other state‐of‐the‐art approaches and show that our approach performs comparably to those of the top three runs in the official ImageCLEF competition. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
32. Mining correlations between medically dependent features and image retrieval models for query classification.
- Author
-
Ayadi, Hajer, Torjmen‐Khemakhem, Mouna, Daoud, Mariam, Huang, Jimmy Xiangji, and Ben Jemaa, Maher
- Subjects
- *
ALGORITHMS , *CLASSIFICATION , *DIAGNOSTIC imaging , *INFORMATION retrieval , *MEDICAL information storage & retrieval systems , *RESEARCH funding , *MEDICAL subject headings - Abstract
The abundance of medical resources has encouraged the development of systems that allow for efficient searches of information in large medical image data sets. State-of-the-art image retrieval models are classified into three categories: content-based (visual) models, textual models, and combined models. Content-based models use visual features to answer image queries, textual image retrieval models use word matching to answer textual queries, and combined image retrieval models, use both textual and visual features to answer queries. Nevertheless, most of previous works in this field have used the same image retrieval model independently of the query type. In this article, we define a list of generic and specific medical query features and exploit them in an association rule mining technique to discover correlations between query features and image retrieval models. Based on these rules, we propose to use an associative classifier ( NaiveClass) to find the best suitable retrieval model given a new textual query. We also propose a second associative classifier ( SmartClass) to select the most appropriate default class for the query. Experiments are performed on Medical ImageCLEF queries from 2008 to 2012 to evaluate the impact of the proposed query features on the classification performance. The results show that combining our proposed specific and generic query features is effective in query classification. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
33. Special issue on advances in web intelligence
- Author
-
Rüger, Stefan, Raghavan, Vijay V., King, Irwin, and Huang, Jimmy Xiangji
- Published
- 2012
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.