8 results on '"Zhao, Jiaqi"'
Search Results
2. TRF-Net: a transformer-based RGB-D fusion network for desktop object instance segmentation.
- Author
-
Cao, He, Zhang, Yunzhou, Shan, Dexing, Liu, Xiaozheng, and Zhao, Jiaqi
- Subjects
BRANCHING processes ,IMAGE segmentation ,ROBOTS - Abstract
To perform object-specific tasks on the desktop, robots need to perceive different objects. The challenge is to calculate the pixel-wise mask for each object, even in the presence of occlusions and unseen objects. We take a step toward this problem by proposing a metric learning-based network called TRF-Net to perform desktop object instance segmentation. We design two ResNet-based branches to process the RGB and depth images separately. Then, we propose a Transformer-based fusion module called TranSE to fuse the features from both branches. This module also transfers the fused features to the decoder part, which helps generate fine-grained decoder features. After that, we propose a multi-scale feature embedding loss function called MFE loss to reduce the intra-class distance and increase the inter-class distance, which contributes to the feature clustering in embedding space. Due to the lack of large-scale real-world datasets for desktop objects, the proposed TRF-Net is trained with the synthetic dataset and tested with the small-scale real-world dataset. The target objects in the testing dataset do not present in the training dataset, ensuring the novelty of testing objects. We demonstrate that our method can produce accurate instance segmentation masks, outperforming other state-of-the-art methods on desktop object instance segmentation. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. Multi-source collaborative enhanced for remote sensing images semantic segmentation.
- Author
-
Zhao, Jiaqi, Zhang, Di, Shi, Boyu, Zhou, Yong, Chen, Jingyang, Yao, Rui, and Xue, Yong
- Subjects
- *
REMOTE sensing , *IMAGE segmentation , *GROUND cover plants , *PROBLEM solving - Abstract
Remote sensing images semantic segmentation is a difficult instance of image understanding. Due to the regional variability and uncertainty of real-world ground cover features, the semantic segmentation of remote sensing images becomes a challenging task. In this paper, we propose an end-to-end multi-source remote sensing image semantic segmentation network (MCENet) aiming at the problems of intra-class inconsistency and inter-class indistinguishability in remote sensing images. Firstly, we design a collaborative enhanced fusion module to mine complementary characteristics of multi-source remote sensing images. Among them, the collaborative fusion module is used to solve the problem of intra-class difference, and the enhanced aggregation module is used to solve the problem of inter-class similarity. Secondly, a multi-scale decoder is proposed to improve the robustness of the model for small targets and large-scale changes by learning scale invariance features. Experimental results show that our method achieved 2.2% and 1.11% mean intersection over union (mIoU) score improvements compared with other methods on the US3D and ISPRS Potsdam data sets, respectively. In addition, the method proposed in this paper also has strong competitiveness in terms of parameter quantity and inference speed. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
4. Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation.
- Author
-
He, Xin, Zhou, Yong, Zhao, Jiaqi, Zhang, Di, Yao, Rui, and Xue, Yong
- Subjects
REMOTE sensing ,CONVOLUTIONAL neural networks ,QUANTUM networks (Optics) - Abstract
Global context information is essential for the semantic segmentation of remote sensing (RS) images. However, most existing methods rely on a convolutional neural network (CNN), which is challenging to directly obtain the global context due to the locality of the convolution operation. Inspired by the Swin transformer with powerful global modeling capabilities, we propose a novel semantic segmentation framework for RS images called ST-U-shaped network (UNet), which embeds the Swin transformer into the classical CNN-based UNet. ST-UNet constitutes a novel dual encoder structure of the Swin transformer and CNN in parallel. First, we propose a spatial interaction module (SIM), which encodes spatial information in the Swin transformer block by establishing pixel-level correlation to enhance the feature representation ability of occluded objects. Second, we construct a feature compression module (FCM) to reduce the loss of detailed information and condense more small-scale features in patch token downsampling of the Swin transformer, which improves the segmentation accuracy of small-scale ground objects. Finally, as a bridge between dual encoders, a relational aggregation module (RAM) is designed to integrate global dependencies from the Swin transformer into the features from CNN hierarchically. Our ST-UNet brings significant improvement on the ISPRS-Vaihingen and Potsdam datasets, respectively. The code will be available at https://github.com/XinnHe/ST-UNet. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. A survey of semi- and weakly supervised semantic segmentation of images.
- Author
-
Zhang, Man, Zhou, Yong, Zhao, Jiaqi, Man, Yiyun, Liu, Bing, and Yao, Rui
- Subjects
CONVOLUTIONAL neural networks ,IMAGE segmentation ,COMPUTER vision ,ARTIFICIAL neural networks ,VISUAL fields ,SUPERVISED learning ,DEEP learning - Abstract
Image semantic segmentation is one of the most important tasks in the field of computer vision, and it has made great progress in many applications. Many fully supervised deep learning models are designed to implement complex semantic segmentation tasks and the experimental results are remarkable. However, the acquisition of pixel-level labels in fully supervised learning is time consuming and laborious, semi-supervised and weakly supervised learning is gradually replacing fully supervised learning, thus achieving good results at a lower cost. Based on the commonly used models such as convolutional neural networks, fully convolutional networks, generative adversarial networks, this paper focuses on the core methods and reviews the semi- and weakly supervised semantic segmentation models in recent years. In the following chapters, existing evaluations and data sets are summarized in details and the experimental results are analyzed according to the data set. The last part of the paper is an objective summary. In addition, it points out the possible direction of research and inspiring suggestions for future work. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
6. Superpixel-Based Multiple Local CNN for Panchromatic and Multispectral Image Classification.
- Author
-
Zhao, Wei, Jiao, Licheng, Ma, Wenping, Zhao, Jiaqi, Zhao, Jin, Liu, Hongying, Cao, Xianghai, and Yang, Shuyuan
- Subjects
PIXELS ,IMAGE analysis ,HIGH resolution imaging ,MULTISPECTRAL imaging ,REMOTE-sensing images - Abstract
Recently, very high resolution (VHR) panchromatic and multispectral (MS) remote-sensing images can be acquired easily. However, it is still a challenging task to fuse and classify these VHR images. Generally, there are two ways for the fusion and classification of panchromatic and MS images. One way is to use a panchromatic image to sharpen an MS image, and then classify a pan-sharpened MS image. Another way is to extract features from panchromatic and MS images, respectively, and then combine these features for classification. In this paper, we propose a superpixel-based multiple local convolution neural network (SML-CNN) model for panchromatic and MS images classification. In order to reduce the amount of input data for the CNN, we extend simple linear iterative clustering algorithm for segmenting MS images and generating superpixels. Superpixels are taken as the basic analysis unit instead of pixels. To make full advantage of the spatial-spectral and environment information of superpixels, a superpixel-based multiple local regions joint representation method is proposed. Then, an SML-CNN model is established to extract an efficient joint feature representation. A softmax layer is used to classify these features learned by multiple local CNN into different categories. Finally, in order to eliminate the adverse effects on the classification results within and between superpixels, we propose a multi-information modification strategy that combines the detailed information and semantic information to improve the classification performance. Experiments on the classification of Vancouver and Xi’an panchromatic and MS image data sets have demonstrated the effectiveness of the proposed approach. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
7. Edge-aware and spectral–spatial information aggregation network for multispectral image semantic segmentation.
- Author
-
Zhang, Di, Zhao, Jiaqi, Chen, Jingyang, Zhou, Yong, Shi, Boyu, and Yao, Rui
- Subjects
- *
MULTISPECTRAL imaging , *IMAGE segmentation , *INFORMATION networks , *REMOTE sensing , *IMAGE analysis , *COMPUTER vision , *MARKOV random fields , *FUZZY algorithms - Abstract
Semantic segmentation is a fundamental task in the field of remote sensing image intelligent interpretation and computer vision. Multispectral remote sensing images have attracted more and more researchers' attention because they can accurately describe different types of reflection spectra. However, inaccurate multispectral feature description leads to edge semantic ambiguity and misclassification of small objects. In this article, we propose a novel network named edge-aware and spectral–spatial information aggregation net (ESSANet) to capture both high-level semantic features and low-level edge details for semantic segmentation of remote sensing images. Specifically, on the one hand, in order to improve the representation ability of discriminant features, we design a two-stream spectral–spatial feature extraction network via 3D hybrid convolution and multi-level aggregation network. On the other hand, in order to eliminate the effect of edge semantic ambiguity, we develop a siamese edge-aware structure and multi-stage edge loss function. Experimental results show that our method achieved 3.5% and 4.09% mean intersection over union (mIoU) score improvements and 2.59% and 3.32% Kappa score improvements compared with the competitive baseline algorithm on the SEN12MS and US3D datasets, respectively. In addition, the method proposed in this paper also achieves a better trade-off between speed and accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. Remote sensing image semantic segmentation via class-guided structural interaction and boundary perception.
- Author
-
He, Xin, Zhou, Yong, Liu, Bing, Zhao, Jiaqi, and Yao, Rui
- Subjects
- *
IMAGE segmentation , *COMPLEX variables - Abstract
Existing remote sensing semantic segmentation methods generally ignore the structural information of objects that is vital in the human visual recognition system. The absence of overall structural information often results in weak perceptions of subtle textures and fragmented predictions, especially for complex and variable ground object scenarios. Besides, they still suffer from the semantic ambiguity caused by the unclear object boundary features in remote sensing images. In this paper, we propose a novel remote sensing semantic segmentation framework, called CSBNet, which aims to enhance the capacity of class-guided structural interaction and boundary perception simultaneously. It consists of a class-guided structure interaction module (CSIM), a Transformer-based context aggregation module (TCAM) and a class-guided boundary supervision module (CBSM). The CSIM has the ability to progressively extract the class-specific structural features, i.e. , refining the structural information of each class by iteratively exchanging information between initial coarse class tokens and contexts. Meanwhile, the TCAM is constructed to provide CSIM with more discriminative multi-scale contexts without losing spatial features. In particular, the CBSM plays an auxiliary role, which applies the boundary information obtained from the class tokens to supervise the segmentation of boundary regions. When tested on the ISPRS dataset, LoveDA dataset, UAVid dataset, our method significantly outperforms the state-of-the-art remote sensing semantic segmentation approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.