Author: "Zhan, Xunlin" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Zhan, Xunlin"' showing total 12 results

Start Over Author "Zhan, Xunlin"

12 results on '"Zhan, Xunlin"'

1. Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval

Author: Dong, Xiao, Zhan, Xunlin, Wei, Yunchao, Wei, Xiaoyong, Wang, Yaowei, Lu, Minlong, Cao, Xiaochun, and Liang, Xiaodan
Subjects: Computer Science - Multimedia, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Databases, Computer Science - Information Retrieval
Abstract: Our goal in this research is to study a more realistic environment in which we can conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories. We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks to enable the evaluations on the price comparison and personalized recommendations. For both instance-level tasks, how to accurately pinpoint the product target mentioned in the visual-linguistic data and effectively decrease the influence of irrelevant contents is quite challenging. To address this, we exploit to train a more effective cross-modal pertaining model which is adaptively capable of incorporating key concept information from the multi-modal data, by using an entity graph whose node and edge respectively denote the entity and the similarity relation between entities. Specifically, a novel Entity-Graph Enhanced Cross-Modal Pretraining (EGE-CMP) model is proposed for instance-level commodity retrieval, that explicitly injects entity knowledge in both node-based and subgraph-based ways into the multi-modal networks via a self-supervised hybrid-stream transformer, which could reduce the confusion between different object contents, thereby effectively guiding the network to focus on entities with real semantic. Experimental results well verify the efficacy and generalizability of our EGE-CMP, outperforming several SOTA cross-modal baselines like CLIP, UNITER and CAPTURE.
Published: 2022

2. elBERto: Self-supervised Commonsense Learning for Question Answering

Author: Zhan, Xunlin, Li, Yuan, Dong, Xiao, Liang, Xiaodan, Hu, Zhiting, and Carin, Lawrence
Subjects: Computer Science - Computation and Language
Abstract: Commonsense question answering requires reasoning about everyday situations and causes and effects implicit in context. Typically, existing approaches first retrieve external evidence and then perform commonsense reasoning using these evidence. In this paper, we propose a Self-supervised Bidirectional Encoder Representation Learning of Commonsense (elBERto) framework, which is compatible with off-the-shelf QA model architectures. The framework comprises five self-supervised tasks to force the model to fully exploit the additional training signals from contexts containing rich commonsense. The tasks include a novel Contrastive Relation Learning task to encourage the model to distinguish between logically contrastive contexts, a new Jigsaw Puzzle task that requires the model to infer logical chains in long contexts, and three classic SSL tasks to maintain pre-trained models language encoding ability. On the representative WIQA, CosmosQA, and ReClor datasets, elBERto outperforms all other methods, including those utilizing explicit graph reasoning and external knowledge retrieval. Moreover, elBERto achieves substantial improvements on out-of-paragraph and no-effect questions where simple lexical similarity comparison does not help, indicating that it successfully learns commonsense and is able to leverage it when given dynamic context.
Published: 2022

3. M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining

Author: Dong, Xiao, Zhan, Xunlin, Wu, Yangxin, Wei, Yunchao, Kampffmeyer, Michael C., Wei, Xiaoyong, Lu, Minlong, Wang, Yaowei, and Liang, Xiaodan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: Despite the potential of multi-modal pre-training to learn highly discriminative feature representations from complementary data modalities, current progress is being slowed by the lack of large-scale modality-diverse datasets. By leveraging the natural suitability of E-commerce, where different modalities capture complementary semantic information, we contribute a large-scale multi-modal pre-training dataset M5Product. The dataset comprises 5 modalities (image, text, table, video, and audio), covers over 6,000 categories and 5,000 attributes, and is 500 larger than the largest publicly available dataset with a similar number of modalities. Furthermore, M5Product contains incomplete modality pairs and noise while also having a long-tailed distribution, resembling most real-world problems. We further propose Self-harmonized ContrAstive LEarning (SCALE), a novel pretraining framework that integrates the different modalities into a unified model through an adaptive feature fusion mechanism, where the importance of each modality is learned directly from the modality embeddings and impacts the inter-modality contrastive learning and masked tasks within a multi-modal transformer model. We evaluate the current multi-modal pre-training state-of-the-art approaches and benchmark their ability to learn from unlabeled data when faced with the large number of modalities in the M5Product dataset. We conduct extensive experiments on four downstream tasks and demonstrate the superiority of our SCALE model, providing insights into the importance of dataset scale and diversity., Comment: CVPR2022
Published: 2021

4. Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining

Author: Zhan, Xunlin, Wu, Yangxin, Dong, Xiao, Wei, Yunchao, Lu, Minlong, Zhang, Yichi, Xu, Hang, and Liang, Xiaodan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Nowadays, customer's demands for E-commerce are more diversified, which introduces more complications to the product retrieval industry. Previous methods are either subject to single-modal input or perform supervised image-level product retrieval, thus fail to accommodate real-life scenarios where enormous weakly annotated multi-modal data are present. In this paper, we investigate a more realistic setting that aims to perform weakly-supervised multi-modal instance-level product retrieval among fine-grained product categories. To promote the study of this challenging task, we contribute Product1M, one of the largest multi-modal cosmetic datasets for real-world instance-level retrieval. Notably, Product1M contains over 1 million image-caption pairs and consists of two sample types, i.e., single-product and multi-product samples, which encompass a wide variety of cosmetics brands. In addition to the great diversity, Product1M enjoys several appealing characteristics including fine-grained categories, complex combinations, and fuzzy correspondence that well mimic the real-world scenes. Moreover, we propose a novel model named Cross-modal contrAstive Product Transformer for instance-level prodUct REtrieval (CAPTURE), that excels in capturing the potential synergy between multi-modal inputs via a hybrid-stream transformer in a self-supervised manner.CAPTURE generates discriminative instance features via masked multi-modal learning as well as cross-modal contrastive pretraining and it outperforms several SOTA cross-modal baselines. Extensive ablation studies well demonstrate the effectiveness and the generalization capacity of our model. Dataset and codes are available at https: //github.com/zhanxlin/Product1M.
Published: 2021

5. REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement

Author: Huang, Yinya, Fang, Meng, Zhan, Xunlin, Cao, Qingxing, Liang, Xiaodan, and Lin, Liang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: When answering a question, people often draw upon their rich world knowledge in addition to the particular context. While recent works retrieve supporting facts/evidence from commonsense knowledge bases to supply additional information to each question, there is still ample opportunity to advance it on the quality of the evidence. It is crucial since the quality of the evidence is the key to answering commonsense questions, and even determines the upper bound on the QA systems performance. In this paper, we propose a recursive erasure memory network (REM-Net) to cope with the quality improvement of evidence. To address this, REM-Net is equipped with a module to refine the evidence by recursively erasing the low-quality evidence that does not explain the question answering. Besides, instead of retrieving evidence from existing knowledge bases, REM-Net leverages a pre-trained generative model to generate candidate evidence customized for the question. We conduct experiments on two commonsense question answering datasets, WIQA and CosmosQA. The results demonstrate the performance of REM-Net and show that the refined evidence is explainable., Comment: Accepted by AAAI 2021
Published: 2020

6. PathReasoner: Explainable reasoning paths for commonsense question answering

Author: Zhan, Xunlin, Huang, Yinya, Dong, Xiao, Cao, Qingxing, and Liang, Xiaodan
Published: 2022
Full Text: View/download PDF

7. Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval

Author: Dong, Xiao, primary, Zhan, Xunlin, additional, Wei, Yunchao, additional, Wei, Xiaoyong, additional, Wang, Yaowei, additional, Lu, Minlong, additional, Cao, Xiaochun, additional, and Liang, Xiaodan, additional
Published: 2023
Full Text: View/download PDF

8. elBERto: Self-supervised commonsense learning for question answering

Author: Zhan, Xunlin, primary, Li, Yuan, additional, Dong, Xiao, additional, Liang, Xiaodan, additional, Hu, Zhiting, additional, and Carin, Lawrence, additional
Published: 2022
Full Text: View/download PDF

9. M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining

Author: Dong, Xiao, primary, Zhan, Xunlin, additional, Wu, Yangxin, additional, Wei, Yunchao, additional, Kampffmeyer, Michael C., additional, Wei, Xiaoyong, additional, Lu, Minlong, additional, Wang, Yaowei, additional, and Liang, Xiaodan, additional
Published: 2022
Full Text: View/download PDF

10. Caption-aided Product Detection via Collaborative Pseudo-Label Harmonization

Author: Dong, Xiao, primary, Zhang, Gengwei, additional, Zhan, Xunlin, additional, Ding, Yi, additional, Wei, Yunchao, additional, Lu, Minlong, additional, and Liang, Xiaodan, additional
Published: 2022
Full Text: View/download PDF

11. Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-Modal Pretraining

Author: Zhan, Xunlin, primary, Wu, Yangxin, additional, Dong, Xiao, additional, Wei, Yunchao, additional, Lu, Minlong, additional, Zhang, Yichi, additional, Xu, Hang, additional, and Liang, Xiaodan, additional
Published: 2021
Full Text: View/download PDF

12. REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement

Author: Huang, Yinya, primary, Fang, Meng, additional, Zhan, Xunlin, additional, Cao, Qingxing, additional, and Liang, Xiaodan, additional
Published: 2021
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

12 results on '"Zhan, Xunlin"'

1. Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval

2. elBERto: Self-supervised Commonsense Learning for Question Answering

3. M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining

4. Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining

5. REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement

6. PathReasoner: Explainable reasoning paths for commonsense question answering

7. Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval

8. elBERto: Self-supervised commonsense learning for question answering

9. M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining

10. Caption-aided Product Detection via Collaborative Pseudo-Label Harmonization

11. Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-Modal Pretraining

12. REM-Net: Recursive Erasure Memory Network for Commonsense Evidence Refinement

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

12 results on '"Zhan, Xunlin"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources