13 results on '"Li, Xuelong"'
Search Results
2. Reunion helper: an edge matcher for sibling fragment identification of the Dunhuang manuscript.
- Author
-
Zheng, Yutong, Li, Xuelong, and Weng, Yu
- Abstract
The Dunhuang ancient manuscripts are an excellent and precious cultural heritage of humanity. However, due to their age, the vast majority of these treasures are damaged and fragmented. Faced with a wide range of sources and numerous fragments, the process of restoration generally involves two core elements: sibling fragments identification and fragment assembly. Currently, fragment restoration still heavily relies on manual labor. During the long practice, a consensus has been reached on the importance of edge features for not only assembly but also for identification. However, accurate extraction of edge features and their use for efficient identification requires extensive knowledge and strong memory. This is a challenge for the human brain. So that in previous studies, fragment edge features have been used for assembly validation but rarely for identification. Therefore, an edge matcher is proposed, working like a bloodhound, capable of "sniffing out" specific "flavors" in edge features and performing efficient sibling fragment identification accordingly, providing guidance when experts perform entity assembly subsequently. Firstly, the fragmented images are standardized. Secondly, traditional methods are used to compress the representation of fragment edges and obtain paired local edge images. Finally, these images are fed into the edge matcher for classification discrimination, which is a CNN-based pairwise similarity metric model proposed in this paper, introducing residual blocks and depthwise separable convolutions, and adding multi-scale convolutional layers. With the edge matcher, a complex matching problem is successfully transformed into a simple classification problem. In the absence of a standard public dataset, a Dunhuang manuscript fragment edge dataset is constructed. Experiments are conducted on that dataset, and the accuracy, precision, recall, and F1 scores of the edge matcher all exceeded 97%. The effectiveness of the edge matcher is demonstrated by comparative experiments, and the rationality of the method design is verified by ablation experiments. The method combines traditional methods and deep learning methods to creatively use the edge geometric features of fragments for sibling fragment identification in a natural rather than coded way, making full use of the computer's computational and memory capabilities. The edge matcher can significantly reduce the time and scope of searching, matching, and inferring fragments, and assist in the reconstruction of Dunhuang ancient manuscript fragments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Gait feature learning via spatio-temporal two-branch networks.
- Author
-
Chen, Yifan and Li, Xuelong
- Subjects
- *
PIXELS , *DATA mining , *FEATURE extraction - Abstract
Gait recognition has become a mainstream technology for identification due to its ability to capture gait features over long distances without subject cooperation and resistance to camouflage. However, current gait recognition methods face challenges as they use a single network to extract both temporal and spatial features from gait sequences. This approach imposes a heavy burden on the network, resulting in reduced extraction efficiency. To solve this problem, we propose a two-branch network to extract the spatio-temporal features of gait sequences. One branch primarily focuses on spatial feature extraction, while the other concentrates on temporal feature extraction. This design can make one branch focus on a specific task, leading to significant performance improvements. For temporal feature extraction, we propose the Global Temporal Information Extraction Network (GTIEN). GTIEN extracts temporal features of gait sequences by sequentially exploring the relationship between adjacent gait silhouettes from pixel and block levels. For spatial feature extraction, we introduce the Selective Horizontal Pyramid Convolution Network (SHPCN). SHPCN explores the multi-granularity features of gait silhouettes from global and local perspectives and assigns them appropriate weights according to their importance. By reasonably combining the temporal features extracted from GTIEN and spatial features extracted from SHPCN, we can effectively learn the spatial–temporal information of the gait sequences. Extensive experiments on CASIA-B and OUMVLP demonstrate that our method has better performance than some state-of-the-art methods. • We sequentially explores the relationship between adjacent gait silhouettes from the pixel level and block level. • We explore global and local features of gait silhouettes from different ranges and selectively assign higher weights to important features. • We combine SHPCN and GTIEN and propose a two-branch network for spatio-temporal gait feature extraction. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Style Transformation-Based Spatial–Spectral Feature Learning for Unsupervised Change Detection.
- Author
-
Liu, Ganchao, Yuan, Yuan, Zhang, Yuelin, Dong, Yongsheng, and Li, Xuelong
- Subjects
RECURRENT neural networks ,MULTISPECTRAL imaging ,CONVOLUTIONAL neural networks ,REMOTE sensing ,SIGNAL convolution - Abstract
Due to the inconsistent imaging environment, the styles of multitemporal multispectral images (MSIs) are quite different, such as image brightness and transparency. For multitemporal MSIs with different styles, the “same object with different spectra” problem is one of the biggest challenges in change detection. To overcome the challenge, a novel unsupervised spatial–spectral feature learning (FL) framework based on style transformation (ST) (called STFL-CD) is proposed for MSI change detection in this article. For dual-temporal MSIs, the proposed STFl-CD algorithm consists of two phases: ST and spatial–spectral FL. Since the image styles are inconsistent under different imaging environments, the first innovation is to transform the image styles through unmixing and reconstruction. Through ST, the challenge of the “same object with different spectra” problem will be reduced fundamentally. By introducing the attention mechanism, the other innovation is to extract the joint spectral–spatial change features based on a 3-D convolutional neural network with spatial and channel attention. In addition, for multitemporal MSIs, a multitemporal version STFL-CD (MT-STFL-CD) framework is designed based on a recurrent neural network to learn the correlation features between multitemporal remote sensing images. Both of the visual and quantitative results on the real MSI datasets indicate that the proposed unsupervised STFL-CD frameworks have significant advantages on multitemporal MSI change detection. In particular, the performance of the proposed unsupervised STFL-CD algorithm is even comparable to that of the state-of-the-art supervised or semisupervised methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. Looking Closer at the Scene: Multiscale Representation Learning for Remote Sensing Image Scene Classification.
- Author
-
Wang, Qi, Huang, Wei, Xiong, Zhitong, and Li, Xuelong
- Subjects
REMOTE sensing ,CONVOLUTIONAL neural networks ,DISTANCE education ,HYPERSPECTRAL imaging systems - Abstract
Remote sensing image scene classification has attracted great attention because of its wide applications. Although convolutional neural network (CNN)-based methods for scene classification have achieved excellent results, the large-scale variation of the features and objects in remote sensing images limits the further improvement of the classification performance. To address this issue, we present multiscale representation for scene classification, which is realized by a global–local two-stream architecture. This architecture has two branches of the global stream and local stream, which can individually extract the global features and local features from the whole image and the most important area. In order to locate the most important area in the whole image using only image-level labels, a weakly supervised key area detection strategy of structured key area localization (SKAL) is specially designed to connect the above two streams. To verify the effectiveness of the proposed SKAL-based two-stream architecture, we conduct comparative experiments based on three widely used CNN models, including AlexNet, GoogleNet, and ResNet18, on four public remote sensing image scene classification data sets, and achieve the state-of-the-art results on all the four data sets. Our codes are provided in https://github.com/hw2hwei/SKAL. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
6. SSR-NET: Spatial–Spectral Reconstruction Network for Hyperspectral and Multispectral Image Fusion.
- Author
-
Zhang, Xueting, Huang, Wei, Wang, Qi, and Li, Xuelong
- Subjects
MULTISPECTRAL imaging ,CONVOLUTIONAL neural networks ,IMAGE fusion ,IMAGE reconstruction - Abstract
The fusion of a low-spatial-resolution hyperspectral image (HSI) (LR-HSI) with its corresponding high-spatial-resolution multispectral image (MSI) (HR-MSI) to reconstruct a high-spatial-resolution HSI (HR-HSI) has been a significant subject in recent years. Nevertheless, it is still difficult to achieve the cross-mode information fusion of spatial mode and spectral mode when reconstructing HR-HSI for the existing methods. In this article, based on a convolutional neural network (CNN), an interpretable spatial–spectral reconstruction network (SSR-NET) is proposed for more efficient HSI and MSI fusion. More specifically, the proposed SSR-NET is a physical straightforward model that consists of three components: 1) cross-mode message inserting (CMMI); this operation can produce the preliminary fused HR-HSI, preserving the most valuable information of LR-HSI and HR-MSI; 2) spatial reconstruction network (SpatRN); the SpatRN concentrates on reconstructing the lost spatial information of LR-HSI with the guidance of spatial edge loss (L
spat ); and 3) spectral reconstruction network (SpecRN); the SpecRN pays attention to reconstruct the lost spectral information of HR-MSI under the constraint of spatial edge loss ({Lspec ). Comparative experiments are conducted on six HSI data sets of Urban, Pavia University (PU), Pavia Center (PC), Botswana, Indian Pines (IP), and Washington DC Mall (WDCM), and the proposed SSR-NET achieves the superior or competitive results in comparison with seven state-of-the-art methods. The code of SSR-NET is available at https://github.com/hw2hwei/SSRNET. [ABSTRACT FROM AUTHOR]- Published
- 2021
- Full Text
- View/download PDF
7. A Gated Recurrent Network With Dual Classification Assistance for Smoke Semantic Segmentation.
- Author
-
Yuan, Feiniu, Zhang, Lin, Xia, Xue, Huang, Qinghua, and Li, Xuelong
- Subjects
CONVOLUTIONAL neural networks ,SMOKE - Abstract
Smoke has semi-transparency property leading to highly complicated mixture of background and smoke. Sparse or small smoke is visually inconspicuous, and its boundary is often ambiguous. These reasons result in a very challenging task of separating smoke from a single image. To solve these problems, we propose a Classification-assisted Gated Recurrent Network (CGRNet) for smoke semantic segmentation. To discriminate smoke and smoke-like objects, we present a smoke segmentation strategy with dual classification assistance. Our classification module outputs two prediction probabilities for smoke. The first assistance is to use one probability to explicitly regulate the segmentation module for accuracy improvement by supervising a cross-entropy classification loss. The second one is to multiply the segmentation result by another probability for further refinement. This dual classification assistance greatly improves performance at image level. In the segmentation module, we design an Attention Convolutional GRU module (Att-ConvGRU) to learn the long-range context dependence of features. To perceive small or inconspicuous smoke, we design a Multi-scale Context Contrasted Local Feature structure (MCCL) and a Dense Pyramid Pooling Module (DPPM) for improving the representation ability of our network. Extensive experiments validate that our method significantly outperforms existing state-of-art algorithms on smoke datasets, and also obtain satisfactory results on challenging images with inconspicuous smoke and smoke-like objects. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
8. Dense Prediction and Local Fusion of Superpixels: A Framework for Breast Anatomy Segmentation in Ultrasound Image With Scarce Data.
- Author
-
Huang, Qinghua, Miao, Zhaoji, Zhou, Shichong, Chang, Cai, and Li, Xuelong
- Subjects
CONVOLUTIONAL neural networks ,IMAGE segmentation ,ULTRASONIC imaging ,BREAST ,COMPUTER vision ,ENDORECTAL ultrasonography ,IMAGE fusion ,COMPUTER performance - Abstract
Segmentation of the breast ultrasound (BUS) image is an important step for subsequent assessment and diagnosis of breast lesions. Recently, Deep-learning-based methods have achieved satisfactory performance in many computer vision tasks, especially in medical image segmentation. Nevertheless, those methods always require a large number of pixel-wise labeled data that is expensive in medical practices. In this study, we propose a new segmentation method by dense prediction and local fusion of superpixels for breast anatomy with scarce labeled data. First, the proposed method generates superpixels from the BUS image enhanced by histogram equalization, a bilateral filter, and a pyramid mean shift filter. Second, using a convolutional neural network (CNN) and distance metric learning-based classifier, the superpixels are projected onto the embedding space and then classified by calculating the distance between superpixels’ embeddings and the centers of categories. By using superpixels, we can generate a large number of training samples from each BUS image. Therefore, the problem of the scarcity of labeled data can be better solved. To avoid the misclassification of the superpixels, $K$ -nearest neighbor (KNN) is used to reclassify the superpixels within every local region based on the spatial relationships among them. Fivefold cross-validation was taken and the experimental results show that our method outperforms several often used deep-learning methods under the condition of the absence of a large number of labeled data (48 BUS images for training and 12 BUS images for testing). [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
9. Multi-stage context refinement network for semantic segmentation.
- Author
-
Liu, Qing, Dong, Yongsheng, and Li, Xuelong
- Subjects
- *
CONVOLUTIONAL neural networks , *IMAGE segmentation , *FEATURE extraction , *FUZZY algorithms - Abstract
Convolutional neural networks have been widely used in image semantic segmentation. However, continuous downsampling operations in convolutional neural networks (such as pooling or convolution with step size) reduce the initial image resolution and lose the spatial details of the image, resulting in blurred image segmentation results. To alleviate this problem, in this paper we propose a multi-stage context refinement network (MCRNet) for semantic segmentation. Specifically, we first construct a Lowest-resolution Chain Context Aggregation (LCCA) module to encode rich semantic information. For obtaining more spatial detail information, we further build a High-resolution Context Attention Refinement (HCAR) module consisting of context feature extraction and context feature refinement. Finally, MCRNet fuses the context information generated by LCCA and HCAR for pixel prediction. Experimental results on three challenging semantic segmentation datasets, namely PASCAL VOC2012, ADE20K and Cityscapes, reveals that our proposed MCRNet is effective. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
10. Field-matching attention network for object detection.
- Author
-
Dong, Yongsheng, Shen, Longchao, Pei, Yuanhua, Yang, Haotian, and Li, Xuelong
- Subjects
- *
CONVOLUTIONAL neural networks - Abstract
Feature pyramid network (FPN) is widely used in object detection in order to divide and conquer objects of different scales and to fuse high and low-level features, and it has achieved encouraging achievements in multi-scale object processing. However, due to the mismatch between receptive fields at different stages, the direct fusion of the two features from different receptive fields may be unable to achieve satisfactory results. Moreover, simple lateral connections in FPN may lead to loss of spatial relationships and details. To alleviate these problems, in this paper we propose a field-matching attention network (FMANet) for object detection. Particularly, we first propose a receptive field dilated module (RFDM), which is used to normalize receptive fields between features at different stages to the same scale. Furthermore, to capture the spatial informations and details, we build a dual attention module (DAM) by employing the spatial attention and channel attention. Utilizing both spatial and channel attention mechanisms simultaneously improves performance while maintaining speed. Finally, experimental results reveal that our proposed FMANet with DSPDarkNet-53 as backbone achieves a competitive detection performance. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
11. Multi-task hierarchical convolutional network for visual-semantic cross-modal retrieval.
- Author
-
Ji, Zhong, Lin, Zhigang, Wang, Haoran, Pang, Yanwei, and Li, Xuelong
- Subjects
- *
CONVOLUTIONAL neural networks , *MODAL logic , *SEMANTICS - Abstract
Bridging visual and textual representations plays a central role in delving into multimedia data understanding. The main challenge arises from that images and texts exist in heterogeneous spaces, leading to the difficulty to preserve the semantic consistency between both modalities. To narrow the modality gap, most recent methods resort to extra object detectors or parsers to obtain the hierarchical representations. In this work, we address this problem by introducing our Multi-Task Hierarchical Convolutional Neural Network (MT-HCN). It is characterized by mining the hierarchical semantic information without the aid of any extra supervisions. Firstly, from the perspective of representing architecture, we leverage the intrinsic hierarchical structure of Convolutional Neural Networks (CNNs) to decompose the representations of both modalities into two semantically complementary levels, i.e. , exterior representations and concept representations. The former focuses on discovering the fine-grained low-level associations between both modalities, meanwhile the latter underlines capturing more high-level abstract semantics. Specifically, we present a Self-Supervised Clustering (SSC) loss to preserve more fine-grained semantic clues in exterior representations. It is constituted on the basis of viewing multiple image/text pairs with similar exterior as a category. In addition, a novel harmonious bidirectional triplet ranking (HBTR) loss is proposed, which mitigate the adverse effects brought about by the biased and noisy negative samples. Besides hardest negatives, it also imposes the constraints on the distance between the positive pairs and the centroid of negative pairs. Extensive experimental results on two popular cross-modal retrieval benchmarks demonstrate our proposed MT-HCN can achieve the competitive results compared with the state-of-the-art methods. • This paper proposes a novel Multi-Task Hierarchical Convolutional Network (MT-HCN) for visual-semantic cross-modal retrieval, which is characterized by adopting classification task to improve the hierarchical multi-modal representation learning. • This paper proposes a novel Self-Supervision Clustering (SSC) loss to learn the exterior representations that fully exploits low-level fine-grained correlation for associating images and texts. • This paper presents an effective bidirectional ranking loss, namely Harmonious Bidirectional Ranking (HBR) for cross-modal correlation preserving. It not only efficiently assists us to seek out more representative hard negative samples, but also leverages the category center of negatives to enhance the robustness of cross-modal representations. • Extensive experiments on two benchmark datasets validate the superiority of our proposed model in comparison to the state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. SSIR: Spatial shuffle multi-head self-attention for Single Image Super-Resolution.
- Author
-
Zhao, Liangliang, Gao, Junyu, Deng, Donghu, and Li, Xuelong
- Subjects
- *
HIGH resolution imaging , *CONVOLUTIONAL neural networks , *TRANSFORMER models , *MNEMONICS , *LARVAL dispersal - Abstract
Benefiting from the development of deep convolutional neural networks, CNN-based single-image super-resolution methods have achieved remarkable reconstruction results. However, the limited perceptual field of the convolutional kernel and the use of static weights in the inference process limit the performance of CNN-based methods. Recently, a few vision transformer-based image super-resolution methods have achieved excellent performance compared to CNN-based methods. These methods contain many parameters and require vast amounts of GPU memory for training. In this paper, we propose a spatial shuffle multi-head self-attention for single-image super-resolution that can significantly model long-range pixel dependencies without additional computational consumption. A local perception module is also proposed to combine convolutional neural networks' local connectivity and translational invariance. Reconstruction results on five popular benchmarks show that the proposed method outperforms existing methods in both reconstruction accuracy and visual performance. The proposed method matches the performance of transformed-based methods but requires an inferior number of transformer blocks, which reduces the number of parameters by 40%, GPU memory by 30%, and inference time by 30% compared to transformer-based methods. • We used attribution analysis to find that some transformer based SR methods can only utilize limited spatial range information during the reconstruction process. • To address this, we introduce Spatial Shuffle Multi-Head Self-attention (SS-MSA) for efficient global pixel dependency modeling and a local perceptual unit to enhance local feature information. • Our method surpasses existing approaches in reconstruction accuracy and visual performance across five benchmarks. Moreover, it reduces parameters by 40%, GPU memory by 30%, and inference time by 30% compared to transformer-based methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Distance-based Weighted Transformer Network for image completion.
- Author
-
Shamsolmoali, Pourya, Zareapoor, Masoumeh, Zhou, Huiyu, Li, Xuelong, and Lu, Yue
- Subjects
- *
TRANSFORMER models , *CONVOLUTIONAL neural networks - Abstract
The challenge of image generation has been effectively modeled as a problem of structure priors or transformation. However, existing models have unsatisfactory performance in understanding the global input image structures because of particular inherent features (for example, local inductive prior). Recent studies have shown that self-attention is an efficient modeling technique for image completion problems. In this paper, we propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components. In our model, we leverage the strengths of both Convolutional Neural Networks (CNNs) and DWT blocks to enhance the image completion process. Specifically, CNNs are used to augment the local texture information of coarse priors and DWT blocks are used to recover certain coarse textures and coherent visual structures. Unlike current approaches that generally use CNNs to create feature maps, we use the DWT to encode global dependencies and compute distance-based weighted feature maps, which substantially minimizes the problem of visual ambiguities. Meanwhile, to better produce repeated textures, we introduce Residual Fast Fourier Convolution (Res-FFC) blocks to combine the encoder's skip features with the coarse features provided by our generator. Furthermore, a simple yet effective technique is proposed to normalize the non-zero values of convolutions, and fine-tune the network layers for regularization of the gradient norms to provide an efficient training stabilizer. Extensive quantitative and qualitative experiments on three challenging datasets demonstrate the superiority of our proposed model compared to existing approaches. • Proposed a generative model for coherent image completion. • Proposed Distance-based Weighted Transformer to encode global dependencies. • Proposed a norm-regularization method to stabilize training. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.