Author: "Shen, Jialie" / Publication Type: Electronic Resources - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Shen, Jialie"' showing total 15 results

Start Over Author "Shen, Jialie" Publication Type Electronic Resources

15 results on '"Shen, Jialie"'

1. Adaptive Multi-Modality Prompt Learning

Author: Wu, Zongqian, Liu, Yujing, Zhan, Mengmeng, Shen, Jialie, Hu, Ping, Zhu, Xiaofeng, Wu, Zongqian, Liu, Yujing, Zhan, Mengmeng, Shen, Jialie, Hu, Ping, and Zhu, Xiaofeng
Abstract: Although current prompt learning methods have successfully been designed to effectively reuse the large pre-trained models without fine-tuning their large number of parameters, they still have limitations to be addressed, i.e., without considering the adverse impact of meaningless patches in every image and without simultaneously considering in-sample generalization and out-of-sample generalization. In this paper, we propose an adaptive multi-modality prompt learning to address the above issues. To do this, we employ previous text prompt learning and propose a new image prompt learning. The image prompt learning achieves in-sample and out-of-sample generalization, by first masking meaningless patches and then padding them with the learnable parameters and the information from texts. Moreover, each of the prompts provides auxiliary information to each other, further strengthening these two kinds of generalization. Experimental results on real datasets demonstrate that our method outperforms SOTA methods, in terms of different downstream tasks.
Published: 2023

2. Rethinking the Localization in Weakly Supervised Object Localization

Author: Xu, Rui, Luo, Yong, Hu, Han, Du, Bo, Shen, Jialie, Wen, Yonggang, Xu, Rui, Luo, Yong, Hu, Han, Du, Bo, Shen, Jialie, and Wen, Yonggang
Abstract: Weakly supervised object localization (WSOL) is one of the most popular and challenging tasks in computer vision. This task is to localize the objects in the images given only the image-level supervision. Recently, dividing WSOL into two parts (class-agnostic object localization and object classification) has become the state-of-the-art pipeline for this task. However, existing solutions under this pipeline usually suffer from the following drawbacks: 1) they are not flexible since they can only localize one object for each image due to the adopted single-class regression (SCR) for localization; 2) the generated pseudo bounding boxes may be noisy, but the negative impact of such noise is not well addressed. To remedy these drawbacks, we first propose to replace SCR with a binary-class detector (BCD) for localizing multiple objects, where the detector is trained by discriminating the foreground and background. Then we design a weighted entropy (WE) loss using the unlabeled data to reduce the negative impact of noisy bounding boxes. Extensive experiments on the popular CUB-200-2011 and ImageNet-1K datasets demonstrate the effectiveness of our method., Comment: Accepted by ACM International Conference on Multimedia 2023
Published: 2023

3. LGViT: Dynamic Early Exiting for Accelerating Vision Transformer

Author: Xu, Guanyu, Hao, Jiawei, Shen, Li, Hu, Han, Luo, Yong, Lin, Hui, Shen, Jialie, Xu, Guanyu, Hao, Jiawei, Shen, Li, Hu, Han, Luo, Yong, Lin, Hui, and Shen, Jialie
Abstract: Recently, the efficient deployment and acceleration of powerful vision transformers (ViTs) on resource-limited edge devices for providing multimedia services have become attractive tasks. Although early exiting is a feasible solution for accelerating inference, most works focus on convolutional neural networks (CNNs) and transformer models in natural language processing (NLP).Moreover, the direct application of early exiting methods to ViTs may result in substantial performance degradation. To tackle this challenge, we systematically investigate the efficacy of early exiting in ViTs and point out that the insufficient feature representations in shallow internal classifiers and the limited ability to capture target semantic information in deep internal classifiers restrict the performance of these methods. We then propose an early exiting framework for general ViTs termed LGViT, which incorporates heterogeneous exiting heads, namely, local perception head and global aggregation head, to achieve an efficiency-accuracy trade-off. In particular, we develop a novel two-stage training scheme, including end-to-end training and self-distillation with the backbone frozen to generate early exiting ViTs, which facilitates the fusion of global and local information extracted by the two types of heads. We conduct extensive experiments using three popular ViT backbones on three vision datasets. Results demonstrate that our LGViT can achieve competitive performance with approximately 1.8 $\times$ speed-up., Comment: ACM MM 2023
Published: 2023
Full Text: View/download PDF

4. Pseudo-Pair based Self-Similarity Learning for Unsupervised Person Re-identification

Author: Wu, Lin, Liu, Deyin, Zhang, Wenying, Chen, Dapeng, Ge, Zongyuan, Boussaid, Farid, Bennamoun, Mohammed, Shen, Jialie, Wu, Lin, Liu, Deyin, Zhang, Wenying, Chen, Dapeng, Ge, Zongyuan, Boussaid, Farid, Bennamoun, Mohammed, and Shen, Jialie
Abstract: Person re-identification (re-ID) is of great importance to video surveillance systems by estimating the similarity between a pair of cross-camera person shorts. Current methods for estimating such similarity require a large number of labeled samples for supervised training. In this paper, we present a pseudo-pair based self-similarity learning approach for unsupervised person re-ID without human annotations. Unlike conventional unsupervised re-ID methods that use pseudo labels based on global clustering, we construct patch surrogate classes as initial supervision, and propose to assign pseudo labels to images through the pairwise gradient-guided similarity separation. This can cluster images in pseudo pairs, and the pseudos can be updated during training. Based on pseudo pairs, we propose to improve the generalization of similarity function via a novel self-similarity learning:it learns local discriminative features from individual images via intra-similarity, and discovers the patch correspondence across images via inter-similarity. The intra-similarity learning is based on channel attention to detect diverse local features from an image. The inter-similarity learning employs a deformable convolution with a non-local block to align patches for cross-image similarity. Experimental results on several re-ID benchmark datasets demonstrate the superiority of the proposed method over the state-of-the-arts., Comment: Under review
Published: 2022

5. Understanding the Teaching Styles by an Attention based Multi-task Cross-media Dimensional modelling

Author: Zhou, Suping, Jia, Jia, Yin, Yufeng, Li, Xiang, Yao, Yang, Zhang, Ying, Ye, Zeyang, Lei, Kehua, Huang, Yan, Shen, Jialie, Zhou, Suping, Jia, Jia, Yin, Yufeng, Li, Xiang, Yao, Yang, Zhang, Ying, Ye, Zeyang, Lei, Kehua, Huang, Yan, and Shen, Jialie
Abstract: Teaching style plays an influential role in helping students to achieve academic success. In this paper, we explore a new problem of effectively understanding teachers' teaching styles. Specifically, we study 1) how to quantitatively characterize various teachers' teaching styles for various teachers and 2) how to model the subtle relationship between cross-media teaching related data (speech, facial expressions and body motions, content et al.) and teaching styles. Using the adjectives selected from more than 10,000 feedback questionnaires provided by an educational enterprise, a novel concept called Teaching Style Semantic Space (TSSS) is developed based on the pleasure-arousal dimensional theory to describe teaching styles quantitatively and comprehensively. Then a multi-task deep learning based model, Attention-based Multi-path Multi-task Deep Neural Network (AMMDNN), is proposed to accurately and robustly capture the internal correlations between cross-media features and TSSS. Based on the benchmark dataset, we further develop a comprehensive data set including 4,541 full-annotated cross-modality teaching classes. Our experimental results demonstrate that the proposed AMMDNN outperforms (+0.0842 in terms of the concordance correlation coefficient (CCC) on average) baseline methods. To further demonstrate the advantages of the proposed TSSS and our model, several interesting case studies are carried out, such as teaching styles comparison among different teachers and courses, and leveraging the proposed method for teaching quality analysis., Comment: ACM Muitimedia 2019
Published: 2019
Full Text: View/download PDF

6. Exploring Representativeness and Informativeness for Active Learning

Author: Du, Bo, Wang, Zengmao, Zhang, Lefei, Zhang, Liangpei, Liu, Wei, Shen, Jialie, Tao, Dacheng, Du, Bo, Wang, Zengmao, Zhang, Lefei, Zhang, Liangpei, Liu, Wei, Shen, Jialie, and Tao, Dacheng
Abstract: How can we find a general way to choose the most suitable samples for training a classifier? Even with very limited prior information? Active learning, which can be regarded as an iterative optimization procedure, plays a key role to construct a refined training set to improve the classification performance in a variety of applications, such as text analysis, image recognition, social network modeling, etc. Although combining representativeness and informativeness of samples has been proven promising for active sampling, state-of-the-art methods perform well under certain data structures. Then can we find a way to fuse the two active sampling criteria without any assumption on data? This paper proposes a general active learning framework that effectively fuses the two criteria. Inspired by a two-sample discrepancy problem, triple measures are elaborately designed to guarantee that the query samples not only possess the representativeness of the unlabeled data but also reveal the diversity of the labeled data. Any appropriate similarity measure can be employed to construct the triple measures. Meanwhile, an uncertain measure is leveraged to generate the informativeness criterion, which can be carried out in different ways. Rooted in this framework, a practical active learning algorithm is proposed, which exploits a radial basis function together with the estimated probabilities to construct the triple measures and a modified Best-versus-Second-Best strategy to construct the uncertain measure, respectively. Experimental results on benchmark datasets demonstrate that our algorithm consistently achieves superior performance over the state-of-the-art active learning algorithms.
Published: 2019

7. NAIRS: A Neural Attentive Interpretable Recommendation System

Author: Yu, Shuai, Wang, Yongbo, Yang, Min, Li, Baocheng, Qu, Qiang, Shen, Jialie, Yu, Shuai, Wang, Yongbo, Yang, Min, Li, Baocheng, Qu, Qiang, and Shen, Jialie
Abstract: In this paper, we develop a neural attentive interpretable recommendation system, named NAIRS. A self-attention network, as a key component of the system, is designed to assign attention weights to interacted items of a user. This attention mechanism can distinguish the importance of the various interacted items in contributing to a user profile. Based on the user profiles obtained by the self-attention network, NAIRS offers personalized high-quality recommendation. Moreover, it develops visual cues to interpret recommendations. This demo application with the implementation of NAIRS enables users to interact with a recommendation system, and it persistently collects training data to improve the system. The demonstration and experimental results show the effectiveness of NAIRS., Comment: This paper was published as a demonstration paper on WSDM'19. In this version, we added a detailed related work section
Published: 2019

8. Unsupervised Deep Video Hashing with Balanced Rotation

Author: Sierra, Carles, Wu, Gengshen, Liu, Li, Guo, Yuchen, Ding, Guiguang, Han, Jungong, Shen, Jialie, Shao, Ling, Sierra, Carles, Wu, Gengshen, Liu, Li, Guo, Yuchen, Ding, Guiguang, Han, Jungong, Shen, Jialie, and Shao, Ling
Abstract: Recently, hashing video contents for fast retrieval has received increasing attention due to the enormous growth of online videos. As the extension of image hashing techniques, traditional video hashing methods mainly focus on seeking the appropriate video features but pay little attention to how the video-specific features can be leveraged to achieve optimal binarization. In this paper, an end-to-end hashing framework, namely Unsupervised Deep Video Hashing (UDVH), is proposed, where feature extraction, balanced code learning and hash function learning are integrated and optimized in a self-taught manner. Particularly, distinguished from previous work, our framework enjoys two novelties: 1) an unsupervised hashing method that integrates the feature clustering and feature binarization, enabling the neighborhood structure to be preserved in the binary space; 2) a smart rotation applied to the video-specific features that are widely spread in the low-dimensional space such that the variance of dimensions can be balanced, thus generating more effective hash codes. Extensive experiments have been performed on two real-world datasets and the results demonstrate its superiority, compared to the state-of-the-art video hashing methods. To bootstrap further developments, the source code will be made publically available.
Published: 2017

9. Dynamic Multi-view Hashing for Online Image Retrieval

Author: Sierra, Carles, Xie, Liang, Shen, Jialie, Han, Jungong, Zhu, Lei, Shao, Ling, Sierra, Carles, Xie, Liang, Shen, Jialie, Han, Jungong, Zhu, Lei, and Shao, Ling
Abstract: Advanced hashing technique is essential to facilitate effective large scale online image organization and retrieval, where image contents could be frequently changed. Traditional multi-view hashing methods are developed based on batch-based learning, which leads to very expensive updating cost. Meanwhile, existing online hashing methods mainly focus on single-view data and thus can not achieve promising performance when searching real online images, which are multiple view based data. Further, both types of hashing methods can only produce hash code with fixed length. Consequently they suffer from limited capability to comprehensive characterization of streaming image data in the real world. In this paper, we propose dynamic multi-view hashing (DMVH), which can adaptively augment hash codes according to dynamic changes of image. Meanwhile, DMVH leverages online learning to generate hash codes. It can increase the code length when current code is not able to represent new images effectively. Moreover, to gain further improvement on overall performance, each view is assigned with a weight, which can be efficiently updated during the online learning process. In order to avoid the frequent updating of code length and view weights, an intelligent buffering scheme is also specifically designed to preserve significant data to maintain good effectiveness of DMVH. Experimental results on two real-world image datasets demonstrate superior performance of DWVH over several state-of-the-art hashing methods.
Published: 2017

10. Unsupervised Deep Video Hashing with Balanced Rotation

Author: Sierra, Carles, Wu, Gengshen, Liu, Li, Guo, Yuchen, Ding, Guiguang, Han, Jungong, Shen, Jialie, Shao, Ling, Sierra, Carles, Wu, Gengshen, Liu, Li, Guo, Yuchen, Ding, Guiguang, Han, Jungong, Shen, Jialie, and Shao, Ling
Abstract: Recently, hashing video contents for fast retrieval has received increasing attention due to the enormous growth of online videos. As the extension of image hashing techniques, traditional video hashing methods mainly focus on seeking the appropriate video features but pay little attention to how the video-specific features can be leveraged to achieve optimal binarization. In this paper, an end-to-end hashing framework, namely Unsupervised Deep Video Hashing (UDVH), is proposed, where feature extraction, balanced code learning and hash function learning are integrated and optimized in a self-taught manner. Particularly, distinguished from previous work, our framework enjoys two novelties: 1) an unsupervised hashing method that integrates the feature clustering and feature binarization, enabling the neighborhood structure to be preserved in the binary space; 2) a smart rotation applied to the video-specific features that are widely spread in the low-dimensional space such that the variance of dimensions can be balanced, thus generating more effective hash codes. Extensive experiments have been performed on two real-world datasets and the results demonstrate its superiority, compared to the state-of-the-art video hashing methods. To bootstrap further developments, the source code will be made publically available.
Published: 2017

11. Dynamic Multi-view Hashing for Online Image Retrieval

Author: Sierra, Carles, Xie, Liang, Shen, Jialie, Han, Jungong, Zhu, Lei, Shao, Ling, Sierra, Carles, Xie, Liang, Shen, Jialie, Han, Jungong, Zhu, Lei, and Shao, Ling
Abstract: Advanced hashing technique is essential to facilitate effective large scale online image organization and retrieval, where image contents could be frequently changed. Traditional multi-view hashing methods are developed based on batch-based learning, which leads to very expensive updating cost. Meanwhile, existing online hashing methods mainly focus on single-view data and thus can not achieve promising performance when searching real online images, which are multiple view based data. Further, both types of hashing methods can only produce hash code with fixed length. Consequently they suffer from limited capability to comprehensive characterization of streaming image data in the real world. In this paper, we propose dynamic multi-view hashing (DMVH), which can adaptively augment hash codes according to dynamic changes of image. Meanwhile, DMVH leverages online learning to generate hash codes. It can increase the code length when current code is not able to represent new images effectively. Moreover, to gain further improvement on overall performance, each view is assigned with a weight, which can be efficiently updated during the online learning process. In order to avoid the frequent updating of code length and view weights, an intelligent buffering scheme is also specifically designed to preserve significant data to maintain good effectiveness of DMVH. Experimental results on two real-world image datasets demonstrate superior performance of DWVH over several state-of-the-art hashing methods.
Published: 2017

12. Distribution-based similarity measures for multi-dimensional point set retrieval applications

Author: Shao, Jie, Huang, Zi, Shen, Heng Tao, Shen, Jialie, Zhou, Xiaofang, Shao, Jie, Huang, Zi, Shen, Heng Tao, Shen, Jialie, and Zhou, Xiaofang
Abstract: Effective and efficient method of similarity assessment continues to be one of the most fundamental problems in multimedia data analysis. In case of retrieving relevant items from a collection of objects based on series of multivariate observations (e.g., searching the similar video clips in a repository to a query example), satisfactory performance cannot be expected using many conventional similarity measures based on the aggregation of element pairwise comparisons. Some correlation information among the individual elements has also been investigated to characterize each set of multidimensional points for ranked retrieval, by making use of an unwarranted assumption that the underlying data distribution has a particular parametric form. Motivated by this observation, this paper introduces a novel collective gauge of relevance ranking by evaluating the probabilities that point sets are consistent with the same distribution of the query. Two non-parametric hypothesis tests in statistics are justified to exploit the distributional discrepancy of samples for assessing the similarity between two ensembles of points. While our methodology is mainly presented in the context of video similarity search, it enjoys great flexibility and can be easily adapted to other applications involving generic multi-dimensional point set representation for each object such as human gesture recognition. Copyright 2008 ACM.
Published: 2008

13. Distribution-based similarity measures for multi-dimensional point set retrieval applications

Author: Shao, Jie, Huang, Zi, Shen, Heng Tao, Shen, Jialie, Zhou, Xiaofang, Shao, Jie, Huang, Zi, Shen, Heng Tao, Shen, Jialie, and Zhou, Xiaofang
Abstract: Effective and efficient method of similarity assessment continues to be one of the most fundamental problems in multimedia data analysis. In case of retrieving relevant items from a collection of objects based on series of multivariate observations (e.g., searching the similar video clips in a repository to a query example), satisfactory performance cannot be expected using many conventional similarity measures based on the aggregation of element pairwise comparisons. Some correlation information among the individual elements has also been investigated to characterize each set of multidimensional points for ranked retrieval, by making use of an unwarranted assumption that the underlying data distribution has a particular parametric form. Motivated by this observation, this paper introduces a novel collective gauge of relevance ranking by evaluating the probabilities that point sets are consistent with the same distribution of the query. Two non-parametric hypothesis tests in statistics are justified to exploit the distributional discrepancy of samples for assessing the similarity between two ensembles of points. While our methodology is mainly presented in the context of video similarity search, it enjoys great flexibility and can be easily adapted to other applications involving generic multi-dimensional point set representation for each object such as human gesture recognition. Copyright 2008 ACM.
Published: 2008

14. Advanced query processing on large multimedia databases

Author: Shen, Jialie, Computer Science and Engineering, Faculty of Engineering, UNSW and Shen, Jialie, Computer Science and Engineering, Faculty of Engineering, UNSW
Published: 2006

15. Exploring Bit-Difference for Approximate KNN Search in High-dimensional Databases

Author: Cui, Bin, Shen, Heng Tao, Shen, Jialie, Tan, Kian Lee, Cui, Bin, Shen, Heng Tao, Shen, Jialie, and Tan, Kian Lee
Abstract: In this paper, we develop a novel index structure to support efficient approximate k-nearest neighbor (KNN) query in high-dimensional databases. In high-dimensional spaces, the computational cost of the distance (e.g., Euclidean distance) between two points contributes a dominant portion of the overall query response time for memory processing. To reduce the distance computation, we first propose a structure (BID) using BIt-Difference to answer approximate KNN query. The BID employs one bit to represent each feature vector of point and the number of bit-difference is used to prune the further points. To facilitate real dataset which is typically skewed, we enhance the BID mechanism with clustering, cluster adapted bitcoder and dimensional weight, named the BID⁺. Extensive experiments are conducted to show that our proposed method yields significant performance advantages over the existing index structures on both real life and synthetic high-dimensional datasets., Singapore-MIT Alliance (SMA)
Published: 2004

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

15 results on '"Shen, Jialie"'

1. Adaptive Multi-Modality Prompt Learning

2. Rethinking the Localization in Weakly Supervised Object Localization

3. LGViT: Dynamic Early Exiting for Accelerating Vision Transformer

4. Pseudo-Pair based Self-Similarity Learning for Unsupervised Person Re-identification

5. Understanding the Teaching Styles by an Attention based Multi-task Cross-media Dimensional modelling

6. Exploring Representativeness and Informativeness for Active Learning

7. NAIRS: A Neural Attentive Interpretable Recommendation System

8. Unsupervised Deep Video Hashing with Balanced Rotation

9. Dynamic Multi-view Hashing for Online Image Retrieval

10. Unsupervised Deep Video Hashing with Balanced Rotation

11. Dynamic Multi-view Hashing for Online Image Retrieval

12. Distribution-based similarity measures for multi-dimensional point set retrieval applications

13. Distribution-based similarity measures for multi-dimensional point set retrieval applications

14. Advanced query processing on large multimedia databases

15. Exploring Bit-Difference for Approximate KNN Search in High-dimensional Databases

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Publication Type

Database

Publisher

15 results on '"Shen, Jialie"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources