18 results on '"Xu, Tianyang"'
Search Results
2. Learning Feature Restoration Transformer for Robust Dehazing Visual Object Tracking.
- Author
-
Xu, Tianyang, Pan, Yifan, Feng, Zhenhua, Zhu, Xuefeng, Cheng, Chunyang, Wu, Xiao-Jun, and Kittler, Josef
- Subjects
- *
TRANSFORMER models , *FEATURE extraction , *WEATHER , *DATA distribution , *VIDEOS , *OBJECT tracking (Computer vision) - Abstract
In recent years, deep-learning-based visual object tracking has obtained promising results. However, a drastic performance drop is observed when transferring a pre-trained model to changing weather conditions, such as hazy imaging scenarios, where the data distribution differs from that of a natural training set. This problem challenges the open-world practical applications of accurate target tracking. In principle, visual tracking performance relies on the discriminative degree of features between the target and its surroundings, rather than the image-level visual quality. To this end, we design a feature restoration transformer that adaptively enhances the representation capability of the extracted visual features for robust tracking in both natural and hazy scenarios. Specifically, a feature restoration transformer is constructed with dedicated self-attention hierarchies for the refinement of potentially contaminated deep feature maps. We endow the feature extraction process with a refinement mechanism typically for hazy imaging scenarios, establishing a tracking system that is robust against foggy videos. In essence, the feature restoration transformer is jointly trained with a Siamese tracking transformer. Intuitively, the supervision for learning discriminative and salient features is facilitated by the entire restoration tracking system. The experimental results obtained on hazy imaging scenarios demonstrate the merits and superiority of the proposed restoration tracking system, with complementary restoration power to image-level dehazing. In addition, consistent advantages of our design can be observed when generalised to different video attributes, demonstrating its capacity to deal with open-world scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Feature enhancement and coarse-to-fine detection for RGB-D tracking.
- Author
-
Zhu, Xue-Feng, Xu, Tianyang, Wu, Xiao-Jun, and Kittler, Josef
- Subjects
- *
BOOSTING algorithms , *FEATURE extraction , *TRACKING algorithms , *ARTIFICIAL satellite tracking - Abstract
Existing RGB-D tracking algorithms advance the performance by constructing typical appearance models from the RGB-only tracking frameworks. There is no attempt to exploit any complementary visual information from the multi-modal input. This paper addresses this deficit and presents a novel algorithm to boost the performance of RGB-D tracking by taking advantage of collaborative clues. To guarantee input consistency, depth images are encoded into the three-channel HHA representation to create input of a similar structure to the RGB images, so that the deep CNN features can be extracted from both modalities. To highlight the discriminatory information in multi-modal features, a feature enhancement module using a cross-attention strategy is proposed. With the attention map produced by the proposed cross-attention method, the target area of the features can be enhanced and the negative influence of the background is suppressed. Besides, we address the potential tracking failure by introducing a long-term mechanism. The experimental results obtained on the well-known benchmarking datasets, including PTB, STC, and CTDB, demonstrate the superiority of the proposed RGB-D tracker. On PTB, the proposed method achieves the highest AUC scores against compared trackers across scenarios with five distinct challenging attributes. On STC and CDTB, our FECD obtains an overall AUC of 0.630 and an F-score of 0.630, respectively. • The single-channel depth maps were encoded into three-channel HHA images. • A feature enhancement method with a cross-attention module to enhance features. • A long-term tracking mechanism to detect failures and recapture lost targets. • Experiments were conducted on several standard tracking benchmarking datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. ETM-face: effective training sample selection and multi-scale feature learning for face detection.
- Author
-
He, Junyuan, Song, Xiaoning, Feng, Zhenhua, Xu, Tianyang, Wu, Xiaojun, and Kittler, Josef
- Subjects
DATA augmentation ,PYRAMIDS ,FEATURE extraction - Abstract
In recent years, deep-learning-based face detectors have achieved promising results and been successfully used in a wide range of practical applications. However, extreme appearance variations are still the major obstacles for robust and accurate face detection in the wild. To address this issue, we propose an Improved Training Sample Selection (ITSS) strategy for mining effective positive and negative samples during network training. The proposed ITSS procedure collaborates with face sampling during data augmentation and selects suitable positive sample centres and IoU overlap for face detection. Moreover, we propose a Residual Feature Pyramid Fusion (RFPF) module that collects semantically robust features to improve the scale-invariance of deep features and better represent faces at different feature pyramid levels. The experimental results obtained on the FDDB and WiderFace datasets demonstrate the superiority of the proposed method over the state-of-the-art approaches. Specially, the proposed method achieves 96.9% and 96.2% in terms of AP on the easy and medium test sets of WiderFace. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
5. MUFusion: A general unsupervised image fusion network based on memory unit.
- Author
-
Cheng, Chunyang, Xu, Tianyang, and Wu, Xiao-Jun
- Subjects
- *
IMAGE fusion , *FEATURE extraction , *MEMORY loss , *INFRARED imaging , *MEMORY , *EVOLUTIONARY algorithms , *KNAPSACK problems - Abstract
Existing image fusion approaches are committed to using a single deep network to solve different image fusion problems, achieving promising performance in recent years. However, devoid of the ground-truth output, in these methods, only the appearance from source images can be exploited during the training process to generate the fused images, resulting in suboptimal solutions. To this end, we advocate a self-evolutionary training formula by introducing a novel memory unit architecture (MUFusion). In this unit, specifically, we utilize the intermediate fusion results obtained during the training process to further collaboratively supervise the fused image. In this way, our fusion results can not only learn from the original input images, but also benefit from the intermediate output of the network itself. Furthermore, an adaptive unified loss function is designed based on this memory unit, which is composed of two loss items, i.e. , content loss and memory loss. In particular, the content loss is calculated based on the activity level maps of source images, which can constrain the output image to contain specific information. On the other hand, the memory loss is obtained based on the previous output of our model, which is utilized to force the network to yield fusion results with higher quality. Considering the handcrafted activity level maps cannot consistently reflect the accurate salience judgement, we put two adaptive weight items between them to prevent this degradation phenomenon. In general, our MUFusion can effectively handle a series of image fusion tasks, including infrared and visible image fusion, multi-focus image fusion, multi-exposure image fusion, and medical image fusion. Particularly, the source images are concatenated in the channel dimension. After that, a densely connected feature extraction network with two scales is used to extract the deep features of the source images. Following this, the fusion result is obtained by two feature reconstruction blocks with skip connections from the feature extraction network. Qualitative and quantitative experiments on 4 image fusion subtasks demonstrate the superiority of our MUFusion, compared to the state-of-the-art methods. • Memory units based on the intermediate fusion results are proposed to boost the fusion performance. • Activity level maps are introduced in the loss function. • A deep network is designed to accomplish the fusion task in an end-to-end manner. • As a general model, the proposed method can solve various fusion tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
6. Learning Structure-Supporting Dependencies via Keypoint Interactive Transformer for General Mammal Pose Estimation.
- Author
-
Xu, Tianyang, Rao, Jiyong, Song, Xiaoning, Feng, Zhenhua, and Wu, Xiao-Jun
- Subjects
- *
COMPUTER vision , *FEATURE extraction , *MAMMALS - Abstract
General mammal pose estimation is an important and challenging task in computer vision, which is essential for understanding mammal behaviour in real-world applications. However, existing studies are at their preliminary research stage, which focus on addressing the problem for only a few specific mammal species. In principle, from specific to general mammal pose estimation, the biggest issue is how to address the huge appearance and pose variances for different species. We argue that given appearance context, instance-level prior and the structural relation among keypoints can serve as complementary evidence. To this end, we propose a Keypoint Interactive Transformer (KIT) to learn instance-level structure-supporting dependencies for general mammal pose estimation. Specifically, our KITPose consists of two coupled components. The first component is to extract keypoint features and generate body part prompts. The features are supervised by a dedicated generalised heatmap regression loss (GHRL). Instead of introducing external visual/text prompts, we devise keypoints clustering to generate body part biases, aligning them with image context to generate corresponding instance-level prompts. Second, we propose a novel interactive transformer that takes feature slices as input tokens without performing spatial splitting. In addition, to enhance the capability of the KIT model, we design an adaptive weight strategy to address the imbalance issue among different keypoints. Extensive experimental results obtained on the widely used animal datasets, AP10K and AnimalKingdom, demonstrate the superiority of the proposed method over the state-of-the-art approaches. It achieves 77.9 AP on the AP10K
val set, outperforming HRFormer by 2.2. Besides, our KITPose can be directly transferred to human pose estimation with promising results, as evaluated on COCO, reflecting the merits of constructing structure-supporting architectures for general mammal pose estimation. [ABSTRACT FROM AUTHOR]- Published
- 2025
- Full Text
- View/download PDF
7. TENet: Targetness entanglement incorporating with multi-scale pooling and mutually-guided fusion for RGB-E object tracking.
- Author
-
Shao, Pengcheng, Xu, Tianyang, Tang, Zhangyong, Li, Linze, Wu, Xiao-Jun, and Kittler, Josef
- Subjects
- *
FEATURE extraction , *MOTION capture (Human mechanics) , *CAMERAS , *SPINE , *SUCCESS - Abstract
There is currently strong interest in improving visual object tracking by augmenting the RGB modality with the output of a visual event camera that is particularly informative about the scene motion. However, existing approaches perform event feature extraction for RGB-E tracking using traditional appearance models, which have been optimised for RGB only tracking, without adapting it for the intrinsic characteristics of the event data. To address this problem, we propose an Event backbone (Pooler), designed to obtain a high-quality feature representation that is cognisant of the innate characteristics of the event data, namely its sparsity. In particular, Multi-Scale Pooling is introduced to capture all the motion feature trends within event data through the utilisation of diverse pooling kernel sizes. The association between the derived RGB and event representations is established by an innovative module performing adaptive Mutually Guided Fusion (MGF). Extensive experimental results show that our method significantly outperforms state-of-the-art trackers on two widely used RGB-E tracking datasets, including VisEvent and COESOT, where the precision and success rates on COESOT are improved by 4.9% and 5.2%, respectively. Our code will be available at https://github.com/SSSpc333/TENet. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
8. Self-supervised learning for RGB-D object tracking.
- Author
-
Zhu, Xue-Feng, Xu, Tianyang, Atito, Sara, Awais, Muhammad, Wu, Xiao-Jun, Feng, Zhenhua, and Kittler, Josef
- Subjects
- *
FEATURE extraction , *SPINE , *OBJECT tracking (Computer vision) - Abstract
Recently, there has been a growing interest in RGB-D object tracking thanks to its promising performance achieved by combining visual information with auxiliary depth cues. However, the limited volume of annotated RGB-D tracking data for offline training has hindered the development of a dedicated end-to-end RGB-D tracker design. Consequently, the current state-of-the-art RGB-D trackers mainly rely on the visual branch to support the appearance modelling, with the depth map utilised for elementary information fusion or failure reasoning of online tracking. Despite the achieved progress, the current paradigms for RGB-D tracking have not fully harnessed the inherent potential of depth information, nor fully exploited the synergy of vision-depth information. Considering the availability of ample unlabelled RGB-D data and the advancement in self-supervised learning, we address the problem of self-supervised learning for RGB-D object tracking. Specifically, an RGB-D backbone network is trained on unlabelled RGB-D datasets using masked image modelling. To train the network, the masking mechanism creates a selective occlusion of the input visible image to force the corresponding aligned depth map to help with discerning and learning vision-depth cues for the reconstruction of the masked visible image. As a result, the pre-trained backbone network is capable of cooperating with crucial visual and depth features of the diverse objects and background in the RGB-D image. The intermediate RGB-D features output by the pre-trained network can effectively be used for object tracking. We thus embed the pre-trained RGB-D network into a transformer-based tracking framework for stable tracking. Comprehensive experiments and the analysis of the results obtained on several RGB-D tracking datasets demonstrate the effectiveness and superiority of the proposed RGB-D self-supervised learning framework and the following tracking approach. • A novel RGB-D backbone network based on self-supervised learning. • Joint extraction of RGB-D feature representation for object localisation. • A Transformer-based tracking method for RGB-D object tracking. • Extensive experiments and analyses on four RGB-D tracking benchmarks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. FPNFuse: A lightweight feature pyramid network for infrared and visible image fusion.
- Author
-
Zhang, Zi‐Han, Wu, Xiao‐Jun, and Xu, Tianyang
- Subjects
IMAGE fusion ,INFRARED imaging ,FEATURE extraction ,PYRAMIDS ,DEEP learning - Abstract
A novel deep learning structure of infrared and visible image fusion is proposed. In particular, feature pyramid networks are developed for enhanced feature extraction across multiple convolutional layers. Besides, a fusion strategy is improved based on the channel attention mechanism to highlight the relevant attributes in the fusion stage. The fusion method consists of four parts: encoder, feature pyramid networks, fusion strategy and decoder, respectively. First, the multi‐scale deep features are extracted from the source images by encoder with embedded feature pyramid networks, realizing cross‐layer interaction. Second, these features are fused by the improved fusion strategy with channel attention for each scale. Finally, the fused features are reconstructed by the designed decoder to produce the informative fused image. The experimental results show that the proposed fusion method achieves state‐of‐the‐art results in both qualitative and quantitative evaluation with a lightweight architecture. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
10. FEXNet: Foreground Extraction Network for Human Action Recognition.
- Author
-
Shen, Zhongwei, Wu, Xiao-Jun, and Xu, Tianyang
- Subjects
HUMAN activity recognition ,CONVOLUTIONAL neural networks ,FEATURE extraction - Abstract
As most human actions in video sequences embody the continuous interactions between foregrounds rather than the background scene, it is significant to disentangle these foregrounds from the background for advanced action recognition systems. In this paper, therefore, we propose a Foreground EXtraction (FEX) block to explicitly model the foreground clues to achieve effective management of action subjects. In particular, the designed FEX block contains two components. The first part is a Foreground Enhancement (FE) module, which highlights the potential feature channels related to the action attributes, providing channel-level refinement for the following spatiotemporal modeling. The second phase is a Scene Segregation (SS) module, which splits feature maps into foreground and background. Specifically, a temporal model with dynamic enhancement is constructed for the foreground part, reflecting the essential nature of the action category. While the background is modeled using simple spatial convolutions, mapping the inputs to the consistent feature space. The FEX blocks can be inserted into existing 2D CNNs (denoted as FEXNet) for spatiotemporal modeling, concentrating on the foreground clues for effective action inference. Our experiments performed on Something-Something V1, V2 and Kinetics400 verify the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
11. From RGB to Depth: Domain Transfer Network for Face Anti-Spoofing.
- Author
-
Wang, Yahang, Song, Xiaoning, Xu, Tianyang, Feng, Zhenhua, and Wu, Xiao-Jun
- Abstract
With the rapid development in face recognition, most of the existing systems can perform very well in unconstrained scenarios. However, it is still a very challenging task to detect face spoofing attacks, thus face anti-spoofing has become one of the most important research topics in the community. Though various anti-spoofing models have been proposed, the generalisation capability of these models usually degrades for unseen attacks in the presence of challenging appearance variations, e.g., background, illumination, diverse spoofing materials and low image quality. To address this issue, we propose to use a Generative Adversarial Network (GAN) that transfers an input face image from the RGB domain to the depth domain. The generated depth clue enables biometric preservation against challenging appearance variations and diverse image qualities. To be more specific, the proposed method has two main stages. The first one is a GAN-based domain transfer module that converts an input image to its corresponding depth map. By design, a live face image should be transferred to a depth map whereas a spoofing face image should be transferred to a plain (black) image. The aim is to improve the discriminative capability of the proposed system. The second stage is a classification model that determines whether an input face image is live or spoofing. Benefit from the use of the GAN-based domain transfer module, the latent variables can effectively represent the depth information, complementarily enhancing the discrimination of the original RGB features. The experimental results obtained on several benchmarking datasets demonstrate the effectiveness of the proposed method, with superior performance over the state-of-the-art methods. The source code of the proposed method is publicly available at https://github.com/coderwangson/DFA. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
12. Complementary Discriminative Correlation Filters Based on Collaborative Representation for Visual Object Tracking.
- Author
-
Zhu, Xue-Feng, Wu, Xiao-Jun, Xu, Tianyang, Feng, Zhen-Hua, and Kittler, Josef
- Subjects
OBJECT tracking (Computer vision) ,TRACKING algorithms ,ART ,FILTERS & filtration ,ART objects ,REGRESSION analysis - Abstract
In recent years, discriminative correlation filter (DCF) based algorithms have significantly advanced the state of the art in visual object tracking. The key to the success of DCF is an efficient discriminative regression model trained with powerful multi-cue features, including both hand-crafted and deep neural network features. However, the tracking performance is hindered by their inability to respond adequately to abrupt target appearance variations. This issue is posed by the limited representation capability of fixed image features. In this work, we set out to rectify this shortcoming by proposing a complementary representation of a visual content. Specifically, we propose the use of a collaborative representation between successive frames to extract the dynamic appearance information from a target with rapid appearance changes, which results in suppressing the undesirable impact of the background. The resulting collaborative representation coefficients are combined with the original feature maps using a spatially regularised DCF framework for performance boosting. The experimental results on several benchmarking datasets demonstrate the effectiveness and robustness of the proposed method, as compared with a number of state-of-the-art tracking algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
13. Learning Low-Rank and Sparse Discriminative Correlation Filters for Coarse-to-Fine Visual Object Tracking.
- Author
-
Xu, Tianyang, Feng, Zhen-Hua, Wu, Xiao-Jun, and Kittler, Josef
- Subjects
- *
OBJECT tracking (Computer vision) , *FILTERS & filtration , *LAGRANGE multiplier , *FEATURE extraction , *TASK analysis - Abstract
Discriminative correlation filter (DCF) has achieved advanced performance in visual object tracking with remarkable efficiency guaranteed by its implementation in the frequency domain. However, the effect of the structural relationship of DCF and object features has not been adequately explored in the context of the filter design. To remedy this deficiency, this paper proposes a Low-rank and Sparse DCF (LSDCF) that improves the relevance of features used by discriminative filters. To be more specific, we extend the classical DCF paradigm from ridge regression to lasso regression, and constrain the estimate to be of low-rank across frames, thus identifying and retaining the informative filters distributed on a low-dimensional manifold. To this end, specific temporal-spatial-channel configurations are adaptively learned to achieve enhanced discrimination and interpretability. In addition, we analyse the complementary characteristics between hand-crafted features and deep features, and propose a coarse-to-fine heuristic tracking strategy to further improve the performance of our LSDCF. Last, the augmented Lagrange multiplier optimisation method is used to achieve efficient optimisation. The experimental results obtained on a number of well-known benchmarking datasets, including OTB2013, OTB50, OTB100, TC128, UAV123, VOT2016 and VOT2018, demonstrate the effectiveness and robustness of the proposed method, delivering outstanding performance compared to the state-of-the-art trackers. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
14. Face anti-spoofing with local difference network and binary facial mask supervision.
- Author
-
Chen, Suyang, Song, Xiaoning, Feng, Zhenhua, Xu, Tianyang, Wu, Xiaojun, and Kittler, Josef
- Subjects
DEEP learning ,SUPERVISION ,CONVOLUTIONAL neural networks - Abstract
Face anti-spoofing (FAS) is crucial for safe and reliable biometric systems. In recent years, deep neural networks have been proven to be very effective for FAS as compared with classical approaches. However, deep learning-based FAS methods are data-driven and use learning-based features only. It is a legitimate question to ask whether hand-crafted features can provide any complementary information to a deep learning-based FAS method. To answer this question, we propose a two-stream network that consists of a convolutional network and a local difference network. To be specific, we first build a texture extraction convolutional block to calculate the gradient magnitude at each pixel of an input image. Our experiments demonstrate that additional liveness cues can be captured by the proposed method. Second, we design an attention fusion module to combine the features obtained from the RGB domain and gradient magnitude domain, aiming for discriminative information mining and information redundancy elimination. Finally, we advocate a simple binary facial mask supervision strategy for further performance boost. The proposed network has only 2.79M parameters and the inference speed is up to 118 frames per second, which makes it very convenient for real-time FAS systems. The experimental results obtained on several well-known benchmarking datasets demonstrate the merits and superiority of the proposed method over the state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
15. DDBFusion: An unified image decomposition and fusion framework based on dual decomposition and Bézier curves.
- Author
-
Zhang, Zeyang, Li, Hui, Xu, Tianyang, Wu, Xiao-Jun, and Kittler, Josef
- Subjects
- *
IMAGE fusion , *INFRARED imaging , *FEATURE extraction , *IMAGE analysis , *RECOMMENDER systems - Abstract
Existing image fusion algorithms mostly concentrate on the design of network architecture and loss functions, and using unified feature extraction strategies while neglecting the division of redundant and effective information. However, for complementary information, unified feature extractor may not appropriate. Thus, this paper presents a unified image fusion algorithm based on Bézier curves image augmentation and hierarchical decomposition, and a self-supervised learning task is constructed to learn the meaningful information. Where Bézier curves aim to simulate different image features and constructed special self-supervised learning samples, so our method does not require task specific data and can be easily trained on public natural image datasets. Meanwhile, our dual decomposition self-supervised training method can bring redundant information filtering capability to the model. During the decomposition stage, we classify and extract different features of the images and only utilize the extracted effective information in the fusion stage, and the decomposition ability of images provides a foundation for advanced visual tasks, such as image segmentation and object detection. Finally, more detailed and comprehensive fusion images are generated, and the existence of redundant information is effectively reduced. The validity of the proposed method is verified through qualitative and quantitative analysis of multiple image fusion tasks, and our algorithm gets the state-of-the-art results on multiple datasets of different image fusion tasks. The code of our fusion method is available at https://github.com/Yukarizz/DDBFusion. • A novel self-supervised learning method is proposed to decompose the image. • Interpretability of the fusion process. • An unified image fusion method. • Good performance on high-level vision tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
16. UMFA: a photorealistic style transfer method based on U-Net and multi-layer feature aggregation.
- Author
-
Rao, Dongyu, Wu, Xiao-Jun, Li, Hui, Kittler, Josef, and Xu, Tianyang
- Subjects
IMAGE reconstruction ,DEEP learning ,FEATURE extraction - Abstract
We propose a photorealistic style transfer network to emphasize the natural effect of photorealistic image stylization. In general, distortion of the image content and lacking of details are two typical issues in the style transfer field. To this end, we design a framework employing the U-Net structure to maintain the rich spatial clues, with a multi-layer feature aggregation (MFA) method to simultaneously provide the details obtained by the shallow layers in the stylization processing. In particular, an encoder based on the dense block and a decoder form a symmetrical structure of U-Net are jointly staked to realize an effective feature extraction and image reconstruction. In addition, a transfer module based on MFA and "adaptive instance normalization" is inserted in the skip connection positions to achieve the stylization. Accordingly, the stylized image possesses the texture of a real photo and preserves rich content details without introducing any mask or postprocessing steps. The experimental results on public datasets demonstrate that our method achieves a more faithful structural similarity with a lower style loss, reflecting the effectiveness and merit of our approach. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
17. An accelerated correlation filter tracker.
- Author
-
Xu, Tianyang, Feng, Zhen-Hua, Wu, Xiao-Jun, and Kittler, Josef
- Subjects
- *
OBJECT tracking (Computer vision) , *FEATURE extraction , *TRACKING algorithms , *ADAPTIVE filters , *DYNAMICAL systems , *FILTERS & filtration , *COMPUTATIONAL complexity - Abstract
• A formulation of the DCF design problem which focuses on informative feature channels and spatial structures by means of novel regularisation. • A proposed relaxed optimisation algorithm referred to as R_A-ADMM for optimising the regularised DCF. In contrast with the standard ADMM, the algorithm achieves a better convergence rate. • A temporal smoothness constraint, implemented by an adaptive initialisation mechanism, to achieve further speed up via transfer learning among video frames. • The proposed adoption of AlexNet to construct a light-weight deep representation with a tracking accuracy comparable to more complicated deep networks, such as VGG and ResNet. • An extensive evaluation of the proposed methodology on several well-known visual object tracking datasets, with the results confirming the acceleration gains for the regularised DCF paradigm. Recent visual object tracking methods have witnessed a continuous improvement in the state-of-the-art with the development of efficient discriminative correlation filters (DCF) and robust deep neural network features. Despite the outstanding performance achieved by the above combination, existing advanced trackers suffer from the burden of high computational complexity of the deep feature extraction and online model learning. We propose an accelerated ADMM optimisation method obtained by adding a momentum to the optimisation sequence iterates, and by relaxing the impact of the error between DCF parameters and their norm. The proposed optimisation method is applied to an innovative formulation of the DCF design, which seeks the most discriminative spatially regularised feature channels. A further speed up is achieved by an adaptive initialisation of the filter optimisation process. The significantly increased convergence of the DCF filter is demonstrated by establishing the optimisation process equivalence with a continuous dynamical system for which the convergence properties can readily be derived. The experimental results obtained on several well-known benchmarking datasets demonstrate the efficiency and robustness of the proposed ACFT method, with a tracking accuracy comparable to the start-of-the-art trackers. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
18. Enhanced robust spatial feature selection and correlation filter learning for UAV tracking.
- Author
-
Wen, Jiajun, Chu, Honglin, Lai, Zhihui, Xu, Tianyang, and Shen, Linlin
- Subjects
- *
FEATURE selection , *OBJECT tracking (Computer vision) , *FEATURE extraction , *TRACKING radar , *AIR filters - Abstract
Spatial boundary effect can significantly reduce the performance of a learned discriminative correlation filter (DCF) model. A commonly used method to relieve this effect is to extract appearance features from a wider region of a target. However, this way would introduce unexpected features from background pixels and noises, which will lead to a decrease of the filter's discrimination power. To address this shortcoming, this paper proposes an innovative method called enhanced robust spatial feature selection and correlation filter Learning (EFSCF), which performs jointly sparse feature learning to handle boundary effects effectively while suppressing the influence of background pixels and noises. Unlike the ℓ 2 -norm-based tracking approaches that are prone to non-Gaussian noises, the proposed method imposes the ℓ 2 , 1 -norm on the loss term to enhance the robustness against the training outliers. To enhance the discrimination further, a jointly sparse feature selection scheme based on the ℓ 2 , 1 -norm is designed to regularize the filter in rows and columns simultaneously. To the best of the authors' knowledge, this has been the first work exploring the structural sparsity in rows and columns of a learned filter simultaneously. The proposed model can be efficiently solved by an alternating direction multiplier method. The proposed EFSCF is verified by experiments on four challenging unmanned aerial vehicle datasets under severe noise and appearance changes, and the results show that the proposed method can achieve better tracking performance than the state-of-the-art trackers. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.