513 results
Search Results
2. Characteristic discriminative prototype network with detailed interpretation for classification.
- Author
-
Wen, Jiajun, Kong, Heng, Lai, Zhihui, and Zhu, Zhijie
- Subjects
- *
SIMILARITY (Physics) , *DEEP learning , *LEARNING strategies , *PROTOTYPES , *PRIOR learning - Abstract
Existing prototype learning methods provide limited interpretation on which patches from input images are similar to the corresponding prototypes. Moreover, these methods do not consider the diversities among the prototypes, which leads to low classification accuracy. To address these problems, this paper proposes Characteristic Prototype Network (CDPNet) with clear interpretation of local regions and characteristic. The network designs the feature prototype to represent the discriminative feature and the characteristic prototype to characterize the prototype's properties among different individuals. In addition, two novel strategies, dynamic region learning and similarity score minimization among similar intra-class prototypes, are designed to learn the prototypes so as to improve their diversity. Therefore, CDPNet can explain which kind of characteristic within the image is the most important one for classification tasks. The experimental results on well-known datasets show that CDPNet can provide clearer interpretations and obtain state-of-the-art classification performance in prototype learning. • We propose a novel interpretable prototype learning method for classification. • The prototype learning mechanism in the previous research is analyzed and the drawbacks are revealed. • This paper presents a new strategy to learn the prototype within the dynamic region by similarity score. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
3. LCCo: Lending CLIP to co-segmentation.
- Author
-
Duan, Xin, Yang, Yan, Pan, Liyuan, and Liu, Xiabi
- Subjects
- *
PERFORMANCE standards , *SPINE , *ENCODING , *CLASSIFICATION - Abstract
This paper studies co-segmenting common semantic objects in a set of images. Existing works either rely on carefully engineered networks to mine implicit semantics in visual features or require extra data (i.e. , classification labels) for training. In this paper, we leverage the contrastive language-image pre-training framework (CLIP) for the task. With a backbone segmentation network that processes each image from the set, we introduce semantics from CLIP into the backbone features, refining them in a coarse-to-fine manner with three key modules: (i) an image set feature correspondence module, encoding global consistent semantics of the image set; (ii) a CLIP interaction module, using CLIP-mined common semantics of the image set to refine the backbone feature; (iii) a CLIP regularization module, drawing CLIP towards co-segmentation, identifying and using the best CLIP semantic to regularize the backbone feature. Experiments on four standard co-segmentation benchmark datasets show that our method outperforms state-of-the-art methods. [Display omitted] • We propose a framework for leveraging CLIP to co-segment common semantics of images. • We design a module to encode the global semantics of the image set. • We design modules to mine common semantics in a coarse-to-fine manner. • We draw CLIP towards co-segmentation by using an MLP with a tailored loss. • We demonstrate state-of-the-art performance on standard benchmarks. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
4. Hierarchical mixture of discriminative Generalized Dirichlet classifiers.
- Author
-
Togban, Elvis and Ziou, Djemel
- Subjects
- *
COLOR space , *MIXTURES , *CLASSIFICATION , *SPAM email - Abstract
This paper presents a discriminative classifier for compositional data. This classifier is based on the posterior distribution of the Generalized Dirichlet which is the discriminative counterpart of Generalized Dirichlet mixture model. Moreover, following the mixture of experts paradigm, we proposed a hierarchical mixture of this classifier. In order to learn the models parameters, we use a variational approximation by deriving an upper-bound for the Generalized Dirichlet mixture. To the best of our knownledge, this is the first time this bound is proposed in the literature. Experimental results are presented for spam detection and color space identification. • This paper addresses the challenge of compositional data classification. • A discriminative classifier based on the Generalized Dirichlet (GD) distribution is proposed. • A meta-classifier, established on the Hierarchical mixture of experts paradigm, was built. • An upper-bound for the mixture of GD was proposed, allowing a variational approximation. • The performance of the models was assessed through spam detection and color space identification. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. A lazy bagging approach to classification
- Author
-
Zhu, Xingquan and Yang, Ying
- Subjects
- *
CLASSIFICATION , *COTTON trade , *BENCHMARKING (Management) , *PAPER - Abstract
Abstract: In this paper, we propose lazy bagging (LB), which builds bootstrap replicate bags based on the characteristics of test instances. Upon receiving a test instance , LB trims bootstrap bags by taking into consideration ''s nearest neighbors in the training data. Our hypothesis is that an unlabeled instance''s nearest neighbors provide valuable information to enhance local learning and generate a classifier with refined decision boundaries emphasizing the test instance''s surrounding region. In particular, by taking full advantage of ''s nearest neighbors, classifiers are able to reduce classification bias and variance when classifying . As a result, LB, which is built on these classifiers, can significantly reduce classification error, compared with the traditional bagging (TB) approach. To investigate LB''s performance, we first use carefully designed synthetic data sets to gain insight into why LB works and under which conditions it can outperform TB. We then test LB against four rival algorithms on a large suite of 35 real-world benchmark data sets using a variety of statistical tests. Empirical results confirm that LB can statistically significantly outperform alternative methods in terms of reducing classification error. [Copyright &y& Elsevier]
- Published
- 2008
- Full Text
- View/download PDF
6. Efficient pattern synthesis for nearest neighbour classifier
- Author
-
Agrawal, Monu, Gupta, Neha, Shreelekshmi, R., and Narasimha Murty, M.
- Subjects
- *
CLASSIFICATION , *INFORMATION organization , *ARCHIVES , *PAPER - Abstract
Abstract: Synthetic pattern generation is one of the strategies to overcome the curse of dimensionality, but it has its own drawbacks. Most of the synthetic pattern generation techniques take more time than simple classification. In this paper, we propose a new strategy to reduce the time and memory requirements by applying prototyping as an intermediate step in the synthetic pattern generation technique. Results show that through the proposed strategy, classification can be done much faster without compromising much in terms of classification accuracy, in fact for some cases it gives better accuracy in lesser time. The classification time and accuracy can be balanced according to available memory and computing power of a system to get the best possible results. [Copyright &y& Elsevier]
- Published
- 2005
- Full Text
- View/download PDF
7. Self-supervised learning from images: No negative pairs, no cluster-balancing.
- Author
-
Mei, Jian-Ping, Wang, Shixiang, and Yu, Miaoqi
- Subjects
- *
IMAGE recognition (Computer vision) , *ARCHITECTURAL design , *CROSS correlation , *LEARNING , *IMAGE representation , *CLASSIFICATION - Abstract
Learning with self-derived targets provides a non-contrastive method for unsupervised image representation learning, where the variety in targets is crucial. Recent work has achieved good performance by learning with targets obtained via cluster-balancing. However, the equal-cluster-size constraint becomes too restrictive for handling data with imbalanced categories or coming in small batches. In this paper, we propose a new clustering-based approach for non-contrastive image representation learning with no need for a particular architecture design or extra memory bank and no explicit constraints on cluster size. A key formulation is to learn embedding consistency and variable decorrelation in the cluster space by tweaking the batch-wise cross-correlation matrix towards an identity one. With this identitization loss incorporated, predicted cluster assignments of two randomly augmented views of the same image serve as targets for each other. We carried out comprehensive experimental studies of linear classification with learned representations of benchmark image datasets. Our results show that the proposed approach significantly outperforms state-of-the-art approaches and is more robust to class imbalance than those with cluster balancing. • A new approach for non-contrastive representation learning from unlabeled images. • Require no memory bank or large batch size and is less sensitive to class imbalance. • Analysis and discussion on the limitations of the existing cluster-balancing strategy. • Comprehensive evaluation with comparisons and various analytical studies. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
8. Pixel shuffling is all you need: spatially aware convmixer for dense prediction tasks.
- Author
-
Ibrahem, Hatem, Salem, Ahmed, and Kang, Hyun-Soo
- Subjects
- *
IMAGE recognition (Computer vision) , *CONVOLUTIONAL neural networks , *MONOCULARS , *CLASSIFICATION - Abstract
ConvMixer is an extremely simple model that could perform better than the state-of-the-art convolutional-based and vision transformer-based methods thanks to mixing the input image patches using a standard convolution. The global mixing process of the patches is only valid for the classification tasks, but it cannot be used for dense prediction tasks as the spatial information of the image is lost in the mixing process. We propose a more efficient technique for image patching, known as pixel shuffling, as it can preserve spatial information. We downsample the input image using the pixel shuffle downsampling in the same form of image patches so that the ConvMixer can be extended for the dense prediction tasks. This paper proves that pixel shuffle downsampling is more efficient than the standard image patching as it outperforms the original ConvMixer architecture in the CIFAR10 and ImageNet-1k classification tasks. We also suggest spatially-aware ConvMixer architectures based on efficient pixel shuffle downsampling and upsampling operations for semantic segmentation and monocular depth estimation. We performed extensive experiments to test the proposed architectures on several datasets; Pascal VOC2012, Cityscapes, and ADE20k for semantic segmentation, NYU-depthV2, and Cityscapes for depth estimation. We show that SA-ConvMixer is efficient enough to get relatively high accuracy at many tasks in a few training epochs (150 ∼ 400). The proposed SA-ConvMixer could achieve an ImageNet-1K Top-1 classification accuracy of 87.02%, mean intersection over union (mIOU) of 87.1% in the PASCAL VOC2012 semantic segmentation task, and absolute relative error of 0.096 in the NYU depthv2 depth estimation task. The implementation code of the proposed method is available at: https://github.com/HatemHosam/SA-ConvMixer/. • Pixel-shuffle patching preserves the spatial relations in contrast to the standard image patching. • Pixel-shuffling is more efficient than the image patching in image classification. • The proposed SA-ConvMixer can efficiently learn the segmentation and depth estimation tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
9. Dual-perspective multi-instance embedding learning with adaptive density distribution mining.
- Author
-
Yang, Mei, Chen, Tian-Lin, Wu, Wei-Zhi, Zeng, Wen-Xi, Zhang, Jing-Yu, and Min, Fan
- Subjects
- *
DENSITY , *VOTING , *ALGORITHMS , *CLASSIFICATION - Abstract
Multi-instance learning (MIL) is a potent framework for solving weakly supervised problems, with bags containing multiple instances. Various embedding methods convert each bag into a vector in the new feature space based on a representative bag or instance, aiming to extract useful information from the bag. However, since the distribution of instances is related to labels, these methods rely solely on the overall perspective embedding without considering the different distribution characteristics, which will conflate the varied distributions of instances and thus lead to poor classification performance. In this paper, we propose the dual-perspective multi-instance embedding learning with adaptive density distribution mining (DPMIL) algorithm with three new techniques. First, the mutual instance selection technique consists of adaptive density distribution mining and discriminative evaluation. The distribution characteristics of negative instances and heterogeneous instance dissimilarity are effectively exploited to obtain instances with strong representativeness. Second, the embedding technique mines two crucial information of the bag simultaneously. Bags are converted into sequence invariant vectors according to the dual-perspective such that the distinguishability is maintained. Finally, the ensemble technique trains a batch of classifiers. The final model is obtained by weighted voting with the contribution of the dual-perspective embedding information. The experimental results demonstrate that the DPMIL algorithm has higher average accuracy than other compared algorithms, especially on web datasets. • We propose an adaptive density distribution mining for instance selection. • We propose a dual-perspective embedding technique to maintain the distinguishability. • We obtain the final ensemble model through weighted voting. • Our algorithm achieves SOTA performance in terms of accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
10. ISDAT: An image-semantic dual adversarial training framework for robust image classification.
- Author
-
Sui, Chenhong, Wang, Ao, Wang, Haipeng, Liu, Hao, Gong, Qingtao, Yao, Jing, and Hong, Danfeng
- Subjects
- *
IMAGE recognition (Computer vision) , *HEURISTIC , *NEURONS , *CONFIDENCE , *CLASSIFICATION - Abstract
Adversarial training is known as one of the most effective heuristic defense methods. Unfortunately, most existing work focuses solely on image-space adversarial training, regardless of the exploration of complementary semantic space. Note that semantic space adversarial training is conducive to compensating for the deficiency of insufficient diversity of adversarial examples in pure image-space one, thereby facilitating the improvement of model robustness. On this account, it is sensible to learn from both adversarial images and features. Therefore, this paper proposes an image-semantic dual adversarial training framework (ISDAT) for the robustness enhancement of the classification model against multi-attacks. In the inner loop of ISDAT, to craft adversarial images as well as adversarial features, both the benign images and semantic features are perturbed through the image space path and semantic space path, respectively. Concerning attacking which intermediate layer of semantic features contributes most to improving the model's anti-attack capability, we provide theoretical analysis for guidance, avoiding invalid neuron importance predictions and excessive computation. To ensure their respective contributions of adversarial images and features to model robustness, we advocate forging them with diverse loss views. In specific, we develop a C2 loss for adversarial feature generation involving semantic variance, aggressiveness, and high confidence. In the outer loop of ISDAT, to promote the model's comprehensive understanding of both adversarial images and adversarial features, we give a joint image-semantic-guided model defense method. In specific, we develop an adversarial image-semantic perception loss (IS). Then, driven by this loss, we further establish an image-semantic end-to-end optimization process, which allows dual learning from both adversarial images and features. Experimental results on the CIFAR-10, CIFAR-100, and SVHN datasets demonstrate the effectiveness of our ISDAT in terms of defending against multiple both white-box and black-box attacks. The code will be available at https://github.com/flower6top. • We propose an image-semantic dual adversarial training framework (ISDAT) against multi-attacks. • We provide theoretical analysis about why and how to generate adversarial features. • In the inner loop of ISDAT, we devise a C2 loss for adversarial feature generation. • In the outer loop of ISDAT, we develop an adversarial image-semantic perception loss (IS). • IS loss enables an image-semantic-guided end-to-end model optimization. • ISDAT exhibits excellent performance in resisting white-box and black-box attacks. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
11. A novel multi-modal fusion method based on uncertainty-guided meta-learning.
- Author
-
Zhang, Duoyi, Bashar, Md Abul, and Nayak, Richi
- Subjects
- *
MACHINE learning , *MULTISENSOR data fusion , *CLASSIFICATION , *GENERALIZATION , *NOISE - Abstract
Multi-modal data fusion for effective feature representation in machine learning is challenging due to intrinsic biases present within and across different modalities. Existing multi-modal data fusion methods often face difficulties in learning generic features due to diverse noise patterns and variations in feature dynamics across different modalities. In this paper, we present a novel method called Uncertainty-guided Meta-Learning Multi-modal Fusion and Classification (UMLMC) to address these challenges. UMLMC dynamically transforms multi-modal feature spaces at both the pre- and post-fusion levels by incorporating uncertainty estimates from an auxiliary network. Our model is optimized using a meta-learning algorithm to enhance its generalization capabilities. Extensive experiments on multi-modal data from diverse domains, along with comparisons to state-of-the-art methods, demonstrate the effectiveness of UMLMC in improving classification performance. These results confirm that UMLMC, with its innovative uncertainty estimation and meta-learning framework, effectively learns informative intra- and inter-modal features, leading to superior classification outcomes. • An uncertainty-guided meta-fusion method for multi-modal fusion and classification. • Mitigating the impact of feature-level bias at both before and after fusion. • Meta-learning to generate less biased uncertainty estimation at the feature level. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
12. TBNet: A texture and boundary-aware network for small weak object detection in remote-sensing imagery.
- Author
-
Li, Zheng, Wang, Yongcheng, Xu, Dongdong, Gao, Yunxiao, and Zhao, Tianqi
- Subjects
- *
OBJECT recognition (Computer vision) , *REMOTE sensing , *IMAGE analysis , *DEEP learning , *CLASSIFICATION - Abstract
Object detection is of great importance for remote sensing image interpretation work and has received significant attention. However, small weak object detection has always been a challenge. The main reason is that the critical information of these objects, such as textures and boundaries, is suppressed by the background and cannot effectively express their own characteristics. To address this issue, we introduce a novel texture and boundary-aware network (TBNet) in this paper. Firstly, we propose a texture-aware enhancement module (TAEM) to explore the texture details within the images. TAEM captures pixel correlations to perceive the distribution of texture in the objects. Secondly, a boundary-aware fusion module (BAFM) is introduced to emphasize spatial positions. BAFM can extract the edge information to guide the prediction of small weak objects. Finally, a task-decoupled RCNN (TD-RCNN) is designed to separate classification and regression tasks. TD-RCNN achieves fine-grained detection, avoiding compromises between subtasks. Comprehensive experiments on four public datasets, DIOR NWPU VHR-10, RSOD, and AI-TOD, demonstrate that TBNet achieves state-of-the-art performance compared to competitors. The model is also evaluated on UAVOD-10, which collects numerous small weak objects. TBNet achieves state-of-the-art results while significantly outperforming competitors, proving its ability to detect small weak objects. • TBNet is designed to detect small and weak objects in complex remote sensing images. • The network enhances object representations by exploring texture and boundary features. • TD-RCNN avoids feature coupling from shared classification and localization. • Results show the model detects remote-sensing objects effectively, especially weak ones. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
13. ZooKT: Task-adaptive knowledge transfer of Model Zoo for few-shot learning.
- Author
-
Zhang, Baoquan, Shan, Bingqi, Li, Aoxue, Luo, Chuyao, Ye, Yunming, and Li, Zhenguo
- Subjects
- *
MACHINE learning , *TRANSFER of training , *ZOOS , *SCARCITY , *CLASSIFICATION - Abstract
Few-shot learning (FSL) aims to recognize novel classes with few examples. It is challenging since it suffers from a data scarcity issue. Although existing methods have shown superior performance, they neglect some pretrained model priors from various datasets/tasks, which are usually free or cheap to collect from some open source platforms (e.g., Github). Inspired by this, in this paper, we focus on a new FSL setting, namely, Model-Zoo-based FSL (MZFSL), which transfers knowledge learned from not only base classes but also a zoo of deep models (called Model Zoo) pretrained on various tasks/datasets. To fully exploit the Model Zoo avoiding the risk of overfitting, we propose a novel knowledge transfer framework, called ZooKT, which amalgamates knowledge of model zoo, instead of finetuning the model zoo, to learn to extract transferable visual features for novel classes. Specifically, we first regard the model zoo as a prior of feature extractor. Then a meta weight prediction network is designed to leverage the prior to predict the weights of target feature extractor for novel classes. Finally, we integrate the target feature extractor to existing FSL methods for performing novel class prediction. Experimental results on few-shot classification and detection scenarios show that our ZooKT outperforms not only single-model finetuning methods but also state-of-the-art multiple-model transfer learning methods, with comparable inference time with a single model. • We formulate a new FSL setting, i.e., Model-Zoo-based FSL (MZFSL). • We propose a novel FSL framework, i.e., ZooKT, which amalgamates knowledge of model zoo to learn to extract transferable visual features. • Experimental results on multiple datasets show that our method achieves superior performance. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
14. A new data complexity measure for multi-class imbalanced classification tasks.
- Author
-
Han, Mingming, Guo, Husheng, and Wang, Wenjian
- Subjects
- *
SKEWNESS (Probability theory) , *CLASSIFICATION - Abstract
The skewed class distribution and data complexity may severely affect the imbalanced classification results. The cost of classification can be significantly reduced if these data complexity are measured and pre-processed prior to training, particularly when dealing with large-scale and high-dimensional datasets. Although many methods have been proposed to evaluate data complexity, most of them fail to fully consider the interaction among different data characteristics, or the connection between class imbalance and these characteristics, thus posing a serious challenge to effectively evaluate the difficulty of classification. This paper presents a new data complexity measure MFII (multi-factor imbalance index), which measures the combined effects of the skewed class distribution and data characteristics on classification difficulty. In particular, it further comprehensively investigates the impact of overlap size, confusion degree, and sub-cluster structure. VoR (value of resolution) and DoC (degree of consistency) are also proposed to evaluate the resolution and interpretability of complexity measures. The experimental results demonstrate that MFII has lower VoR and a stronger correlation with classification metrics, which indicates that MFII can more accurately evaluate the difficulty of multi-class imbalanced classification tasks. • MFII considers class imbalance and various overlap factors to assess data complexity. • VoR and DoC are proposed to estimate resolution and stability of complexity measures. • MFII has strong negative correlation with classification results. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
15. CompNET: Boosting image recognition and writer identification via complementary neural network post-processing.
- Author
-
Zhao, Bocheng, Cao, Xuan, Zhang, Wenxing, Liu, Xujie, Miao, Qiguang, and Li, Yunan
- Subjects
- *
DISTRIBUTION (Probability theory) , *PROBABILITY theory , *FORECASTING , *CLASSIFICATION , *AUTHORS - Abstract
In current classification tasks, an important method to improve accuracy is to pre-train the model using a large-scale domain-specific dataset. However, many tasks such as writer identification (writerID) lack suitable large-scale datasets in practical scenarios. To address this issue, this paper proposes a method that can improve prediction accuracy without relying on significant pre-training but leveraging the diversity of probability distributions predicted by multiple networks, and enhancing the top-1 accuracy through complementary post-processing. Specifically, top-k distributions are sampled from the multiple probability mass functions separately. When the distribution differences of top-k are maximized, the intersection other than the correct category can be narrowed down. Finally, the correct target with suboptimal probability can be rectified by the only intersection. Furthermore, our method has exhibited an intriguing trait during experimentation. Its prediction accuracy enhances concurrently with the incorporation of novel SOTA methods, ultimately surpassing the performance of these new methods. • The proposed method utilizes information from top-k outputs to boost the accuracy of top-1 results without vast pre-training process. • This efficient post-processing framework can rectify the probability of the correct class being assigned low confidence by fusing information from the top-k predictions. • Our simple, modular approach is decoupled and extendable to image classification tasks, achieving better performance. • Prediction accuracy improves with the integration of novel SOTA methods, ultimately surpassing their performance. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
16. The impact of class imbalance in classification performance metrics based on the binary confusion matrix.
- Author
-
Luque, Amalia, Carrasco, Alejandro, Martín, Alejandro, and de las Heras, Ana
- Subjects
- *
KEY performance indicators (Management) , *NUMERICAL functions , *CLASSIFICATION , *SET functions , *STATISTICAL correlation - Abstract
Highlights • Imbalance coefficient fosters measuring imbalance. • Geometric Mean and Bookmaker Informedness constitute the best unbiased metrics. • Matthews Correlation Coefficient is the best option for error consideration. • The concept of Class Balance Accuracy can be extended to other metrics. Abstract A major issue in the classification of class imbalanced datasets involves the determination of the most suitable performance metrics to be used. In previous work using several examples, it has been shown that imbalance can exert a major impact on the value and meaning of accuracy and on certain other well-known performance metrics. In this paper, our approach goes beyond simply studying case studies and develops a systematic analysis of this impact by simulating the results obtained using binary classifiers. A set of functions and numerical indicators are attained which enables the comparison of the behaviour of several performance metrics based on the binary confusion matrix when they are faced with imbalanced datasets. Throughout the paper, a new way to measure the imbalance is defined which surpasses the Imbalance Ratio used in previous studies. From the simulation results, several clusters of performance metrics have been identified that involve the use of Geometric Mean or Bookmaker Informedness as the best null-biased metrics if their focus on classification successes (dismissing the errors) presents no limitation for the specific application where they are used. However, if classification errors must also be considered, then the Matthews Correlation Coefficient arises as the best choice. Finally, a set of null-biased multi-perspective Class Balance Metrics is proposed which extends the concept of Class Balance Accuracy to other performance metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
17. Fast and accurate computation of Racah moment invariants for image classification.
- Author
-
Benouini, Rachid, Batioua, Imad, Zenkouar, Khalid, Zahi, Azeddine, Fadili, Hakim El, and Qjidaa, Hassan
- Subjects
- *
CLASSIFICATION , *FEATURE extraction , *IMAGE analysis , *MOMENTS method (Statistics) , *IMAGE , *FORENSIC genetics - Abstract
Highlights • This paper introduces new set of moment invariants for image classification. • This paper provides a fast and accurate method for computing moment invariants. • Numerical experiments are performed to demonstrate its validity and superiority. Abstract In this paper, a new set of moment invariants, named Racah Moment Invariants (RMI), is introduced in the field of image analysis. This new set can be used to describe pattern feature independently of Rotation, Scaling and Translation transforms. Moreover, new fast and accurate algorithm, using recursive method, is developed for accelerating the computation time of the newly proposed invariants, as well as, for enhancing their numerical stability. Subsequently, several experiments have been performed. Initially, the numerical stability and computational cost are depicted. Secondly, the global and local features extraction are clearly illustrated. Then, invariability property and noise robustness are investigated. Finally, the discrimination power and the classification accuracy of the proposed invariants are extensively tested on several publicly available databases. The presented theoretical and experimental results, clearly show that the proposed method can be extremely useful in the fields of image classification. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
18. Coarse is better? A new pipeline towards self-supervised learning with uncurated images.
- Author
-
Zhu, Ke, He, Yin-Yin, and Wu, Jianxin
- Subjects
- *
DATA augmentation , *CROPS , *ANNOTATIONS , *CLASSIFICATION - Abstract
Most self-supervised learning (SSL) methods often work on curated datasets where the object-centric assumption holds. This assumption breaks down in uncurated images. Existing scene image SSL methods try to find the two views from original scene images that are well matched or dense, which is both complex and computationally heavy. This paper proposes a conceptually different pipeline: first find regions that are coarse objects (with adequate objectness), crop them out as pseudo object-centric images, then any SSL method can be directly applied as in a real object-centric dataset. That is, coarse crops benefits scene images SSL. A novel cropping strategy that produces coarse object box is proposed. The new pipeline and cropping strategy successfully learn quality features from uncurated datasets without ImageNet. Experiments show that our pipeline outperforms existing SSL methods (MoCo-v2, DenseCL and MAE) on classification, detection and segmentation tasks. We further conduct extensively ablations to verify that: (1) the pipeline do not rely on pretrained models; (2) the cropping strategy is better than existing object discovery methods; (3) our method is not sensitive to hyperparameters and data augmentations. • Coarse boxes are better than groundtruth annotations for SSL with uncurated scene images. • Generating views from pseudo-centric crops is a better pipeline. • A new and conceptually different pipeline for scene image SSL. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
19. Bimodal Masked Autoencoders with internal representation connections for electrocardiogram classification.
- Author
-
Wei, Yufeng, Lian, Cheng, Xu, Bingrong, Zhao, Pengbo, Yang, Honggang, and Zeng, Zhigang
- Subjects
- *
FREQUENCY spectra , *TIME series analysis , *ELECTROCARDIOGRAPHY , *DATA modeling , *CLASSIFICATION - Abstract
Time series self-supervised methods have been widely used, with electrocardiogram (ECG) classification tasks also reaping their benefits. One mainstream paradigm is masked data modeling, which leverages the visible part of data to reconstruct the masked part, aiding in acquiring useful representations for downstream tasks. However, traditional approach predominantly attends to time domain information and places excessive demands on the encoder for reconstruction, thereby hurting model's discriminative ability. In this paper, we present Bimodal Masked autoencoders with Internal Representation Connections (BMIRC) for ECG classification. On the one hand, BMIRC integrates the frequency spectrum of ECG into the masked pre-training process, enhancing the model's comprehensive understanding of the ECG. On the other hand, it establishes internal representation connections (IRC) from the encoder to the decoder, which offers the decoder various levels of information to aid in reconstruction, thereby allowing the encoder to focus on modeling discriminative representations. We conduct comprehensive experiments across three distinct ECG datasets to validate the effectiveness of BMIRC. Experimental results demonstrate that BMIRC surpasses the competitive baselines across the majority of scenarios, encompassing both intra-domain (pre-training and fine-tuning on the same dataset) and cross-domain (pre-training and fine-tuning on different datasets) settings. The code is publicly available at https://github.com/Envy-Clouds/BMIRC. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
20. SoftPatch+: Fully unsupervised anomaly classification and segmentation.
- Author
-
Wang, Chengjie, Jiang, Xi, Gao, Bin-Bin, Gan, Zhenye, Liu, Yong, Zheng, Feng, and Ma, Lizhuang
- Subjects
- *
DATA scrubbing , *PROBLEM solving , *NOISE , *ALGORITHMS , *CLASSIFICATION - Abstract
Although mainstream unsupervised anomaly detection (AD) (including image-level classification and pixel-level segmentation) algorithms perform well in academic datasets, their performance is limited in practical application due to the ideal experimental setting of clean training data. Training with noisy data is an inevitable problem in real-world anomaly detection but is seldom discussed. This paper is the first to consider fully unsupervised industrial anomaly detection (i.e., unsupervised AD with noisy data). To solve this problem, we proposed memory-based unsupervised AD methods, SoftPatch and SoftPatch+, which efficiently denoise the data at the patch level. Noise discriminators are utilized to generate outlier scores for patch-level noise elimination before coreset construction. The scores are then stored in the memory bank to soften the anomaly detection boundary. Compared with existing methods, SoftPatch maintains a strong modeling ability of normal data and alleviates the overconfidence problem in coreset, and SoftPatch+ has more robust performance which is particularly useful in real-world industrial inspection scenarios with high levels of noise (from 10% to 40%). Comprehensive experiments conducted in diverse noise scenarios demonstrate that both SoftPatch and SoftPatch+ outperform the state-of-the-art AD methods on the MVTecAD, ViSA, and BTAD benchmarks. Furthermore, the performance of SoftPatch and SoftPatch+ is comparable to that of the noise-free methods in conventional unsupervised AD setting. The code of the proposed methods can be found at https://github.com/TencentYoutuResearch/AnomalyDetection-SoftPatch. • We build a protocol for fully unsupervised anomaly classification and segmentation. • We propose SoftPatch, a patch-level denoising method for coreset memory bank. • We present SoftPatch+ using multiple discriminators for robust noise discovery. • We establish a baseline for fully unsupervised anomaly classification and segmentation. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
21. IMWA: Iterative Model Weight Averaging benefits class-imbalanced learning.
- Author
-
Huang, Zitong, Chen, Ze, Dong, Bowen, Liang, Chaoqi, Zhou, Erjin, and Zuo, Wangmeng
- Subjects
- *
WEIGHT training , *MOVING average process , *VANILLA , *CLASSIFICATION - Abstract
Model Weight Averaging (MWA) enhances model performance by averaging weights of multiple trained models. This paper shows that MWA (1) is beneficial for class-imbalanced learning, (2) with early-epoch averaging yielding the most improvement. Building on these insights, we propose Iterative Model Weight Averaging (IMWA) for class-imbalanced learning tasks. IMWA divides training into multiple episodes, within which multiple models are trained from the same initial weights and then averaged into a single model. This averaged model initializes the next episode, creating an iterative approach. IMWA offers higher performance improvements compared to MWA. Notably, several class-imbalanced learning methods use Exponential Moving Average (EMA) to gradually update models weight for improving performance. Our IMWA method synergizes effectively with EMA-based approaches, leading to enhanced overall performance. Extensive experiments validate IMWA's effectiveness across various class-imbalanced learning tasks, including classification and object detection. • We find that vanilla MWA performs well for class-imbalanced tasks and that early-epoch averaging yields greater gains, inspiring the design of IMWA. • IMWA iteratively conducts parallel training and weight averaging, and its integration with EMA shows their complementary benefits. • Extensive experiments demonstrate that IMWA outperforms vanilla MWA and effectively boosts performance for class-imbalanced learning. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
22. Multiple instance learning: A survey of problem characteristics and applications.
- Author
-
Carbonneau, Marc-André, Granger, Eric, Gagnon, Ghyslain, and Cheplygina, Veronika
- Subjects
- *
SUPERVISED learning , *COMPUTER vision , *DOCUMENT classification (Electronic documents) , *ARTIFICIAL intelligence , *ALGORITHMS - Abstract
Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research. Code is available on-line at https://github.com/macarbonneau/MILSurvey . [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
23. Heterogeneous domain adaptation via incremental discriminative knowledge consistency.
- Author
-
Lu, Yuwu, Lin, Dewei, Wen, Jiajun, Shen, Linlin, Li, Xuelong, and Wen, Zhenkun
- Subjects
- *
MARGINAL distributions , *DATA distribution , *STATISTICAL correlation , *CLASSIFICATION , *GEOMETRY - Abstract
Heterogeneous domain adaptation is a challenging problem in transfer learning since samples from the source and target domains reside in different feature spaces with different feature dimensions. The key problem is how to minimize some gaps (e.g., data distribution mismatch) presented in two heterogeneous domains and produce highly discriminative representations for the target domain. In this paper, we attempt to address these challenges with the proposed incremental discriminative knowledge consistency (IDKC) method, which integrates cross-domain mapping, distribution matching, discriminative knowledge preservation, and domain-specific geometry structure consistency into a unified learning model. Specifically, we attempt to learn a domain-specific projection to project original samples into a common subspace in which the marginal distribution is well aligned and the discriminative knowledge consistency is preserved by leveraging the labeled samples from both domains. Moreover, domain-specific structure consistency is enforced to preserve the data manifold from the original space to the common feature space in each domain. Meanwhile, we further apply pseudo labeling to unlabeled target samples based on the feature correlation and retain pseudo labels with high feature correlation coefficients for the next iterative learning. Our pseudo-labeling strategy expands the number of labeled target samples in each category and thus enforces class-discriminative knowledge consistency to produce more discriminative feature representations for the target domain. Extensive experiments on several standard benchmarks for object recognition, cross-language text classification, and digit classification tasks verify the effectiveness of our method. • Distribution alighment is learnt to preserve discriminative knowledge consistency. • The data attribute can be preserved by optimizing the reconstruction loss. • Pseudo labeling strategy is developed for classifying unlabeled target samples. • The developed IDKC framework outperforms the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. BMPCN: A Bigraph Mutual Prototype Calibration Net for few-shot classification.
- Author
-
Zhang, Jing, Chen, Mingzhe, Hu, Yunzuo, Zhang, Xinzhou, and Wang, Zhe
- Subjects
- *
PRIOR learning , *PROTOTYPES , *CALIBRATION , *LEARNING modules , *CLASSIFICATION - Abstract
In recent studies on few-shot classification, most of the existing methods utilized word embeddings as prior knowledge to adjust the distribution of visual prototypes. However, this straightforward fusion of visual and semantic features profoundly alters the feature distribution in the original feature space, rendering it unable to effectively calibrate feature distribution through mutual guidance of cross-modal information. To address this problem, we propose a novel Bigraph Mutual Prototype Calibration Network (BMPCN) for few-shot learning in this paper, in which we not only update the distribution of class features based on prototype-level similarity in both visual and semantic spaces but also facilitate the mutual guidance of visual and semantic feature updates through instance-level similarity. In the BMPCN, a bigraph mutual promotion structure is proposed, wherein a visual graph is constructed with visual features as nodes and the similarity between visual features as edges. Simultaneously, the semantic feature nodes are automatically generated from images, and the class-level prior knowledge is leveraged to correct these automatically generated semantic nodes. To better update the bigraph mutual promotion structure, we propose a Bigraph Interactive Augmentation Module (BIAM), a Nearest Neighbor Proto-level Similarity Promotion Module (NN-PSP), and a Proto-level Similarity Promotion Module (PK-PSP) based on original knowledge augmentation to perform the bigraph update. For inter-graph updating, we use the prototype-level similarity obtained from the NN-PSP and PK-PSP modules to fully learn task-level information, thus enabling task-specific prototype updates. For intra-graph updating, our visual and semantic graphs use instance-level similarity analysis to extract potential correlations between different feature domains and implement mutual guidance in the BIAM module to correct the feature distribution of visual and semantic features. Experiments on three widely used benchmarks illustrated that our proposed method obtains excellent performance based on the backbone Conv-4, and the results outperform state-of-the-art methods by about 8% on miniImageNet, tieredImageNet, and CUB-200-2011. Code has been available at https://github.com/cmzHome/BMPCN-MASTER. • A bigraph mutual prototype calibration net is proposed for few-shot classification. • A bigraph interactive augmentation module is proposed to realize mutual guidance. • A proto-level similarity with task-specific information is proposed. • An instance-level similarity with potential correlations is proposed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. DACBN: Dual attention convolutional broad network for fine-grained visual recognition.
- Author
-
Chen, Tao, Wang, Lijie, Liu, Yang, and Yu, Haisheng
- Subjects
- *
LABOR costs , *PROBLEM solving , *INSTRUCTIONAL systems , *RECOGNITION (Psychology) , *CLASSIFICATION - Abstract
Fine-grained visual classification (FGVC) is a challenging task due to its small inter-class differences and large intra-class differences. Most existing methods rely on manual labeling of key identification areas, which requires high labor costs. In addition, existing methods also tend to ignore the differences effect of different feature channels in the feature map, which has a certain impact on the model classification accuracy. To solve the above problems, this paper proposes a dual attention convolutional broad network. Firstly, a new dual attention mechanism is designed to suppress the background noise of fine-grained images and give greater weight to the discriminative feature regions and channels. Secondly, the ensemble broad learning system framework is used to further enhance the dual attention features, so that the discriminative features can further improve the recognition ability of the model. Finally, by multiple comparative experiments, it is reported that the method proposed in this article has achieved excellent recognition results on three commonly used datasets. • We propose a DACN and a DACBN for fine-grained recognition. • A new dual attention mechanism is designed to suppress the background noise of fine-grained images while giving greater weight to the discriminative areas. • The ensemble broad learning system framework is introduced to further enhance the dual attention features extracted by DACN. • By comparative experiments with state-of-the-art methods, the superiority of DACBN is verified. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. A novel hybrid decoding neural network for EEG signal representation.
- Author
-
Ji, Youshuo, Li, Fu, Fu, Boxun, Zhou, Yijin, Wu, Hao, Li, Yang, Li, Xiaoli, and Shi, Guangming
- Subjects
- *
MOTOR imagery (Cognition) , *BRAIN-computer interfaces , *CONVOLUTIONAL neural networks , *ELECTROENCEPHALOGRAPHY , *CLASSIFICATION - Abstract
• Combine the advantages of CNNs and multi-head mechanisms to decode EEG. • Depthwise separable convolution decouples temporal relevant information. • Multi-head mechanism adaptively modified for EEG focuses on spatial activation. • Three different BCI classification tasks can be decoded. In this paper, we proposed a novel hybrid decoding model that combines the superiority of CNNs and multi-head self-attention mechanisms, called HCANN, to finely characterizing EEG features. Depthwise separable convolution with multi-scale factors efficiently decouples temporal relevant information between brain-computer interface (BCI) tasks and EEG signals. Multi-head mechanism adaptively modified for EEG focuses on brain spatial activation patterns and extracts complementary spatial representation information from multiple subspaces. The proposed HCANN decodes the intent information of EEG recorded by three BCI paradigms, including one active and two passive BCI paradigms: rapid serial visual presentation, motor imagery, and imagined speech. We evaluated HCANN by comparing with the current state-of-the-art methods. The experimental results demonstrated that HCANN can effectively decode EEG and improves classification performance for all three BCI tasks. In addition, the visualization of spatial-temporal features at different decoding stages demonstrated that the proposed HCANN gradually extracts effective features related to the BCI tasks. The code of HCANN is publicly available at https://github.com/youshuoji/HCANN. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Graph Attentive Dual Ensemble learning for Unsupervised Domain Adaptation on point clouds.
- Author
-
Li, Qing, Yan, Chuan, Hao, Qi, Peng, Xiaojiang, and Liu, Li
- Subjects
- *
POINT cloud , *AUTODIDACTICISM , *CLASSIFICATION , *GENERALIZATION , *TEACHERS - Abstract
Due to the annotation difficulty of point clouds, Unsupervised Domain Adaptation (UDA) is a promising direction to address unlabeled point cloud classification and segmentation. Recent works show that adding a self-supervised learning branch for target domain training consistently boosts UDA point cloud tasks. However, most of these works simply resort to geometric deformation, which ignores semantic information and is hard to bridge the domain gap. In this paper, we propose a novel self-learning strategy for UDA on point clouds, termed as Graph Attentive Dual Ensemble learning (GRADE), which delivers semantic information directly. Specifically, with a pre-training process on the source domain, GRADE further builds dual collaborative training branches on the target domain, where each of them constructs a temporal average teacher model and distills its pseudo labels to the other branch. To achieve faithful labels from each teacher model, we improve the popular DGCNN architecture by introducing a dynamic graph attentive module to mine the relation between local neighborhood points. We conduct extensive experiments on several UDA point cloud benchmarks, and the results demonstrate that our GRADE method outperforms the state-of-the-art methods on both classification and segmentation tasks with clear margins. • We propose Graph Attentive Dual Ensemble (GRADE) for efficient semantic transfer in 3D point clouds. • We propose a dual ensemble network for consistent generalization and accurate reconstruction. • We propose a dynamic graph attentive module to enhance semantic features in 3D point clouds. • We propose an intra-domain mixup scheme to increase training sample diversity. • Experiments show GRADE achieves SOTA performance on 3D point cloud classification and segmentation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Dynamic and static fusion mechanisms of infrared and visible images.
- Author
-
Fang, Aiqing and Li, Ying
- Subjects
- *
INFRARED imaging , *AUTODIDACTICISM , *CLASSIFICATION , *DEEP learning - Abstract
This paper propose a dynamic fusion mechanism of infrared and visible images, named DIFM, capable of solving the static fusion optimization problem. The DIFM correlates the image fusion quality with the image restoration quality to construct a unified optimization loss function. According to the DIFM, a dynamic image fusion network of infrared and visible images is constructed and is therefore denoted with DF-Net. Specifically, the DF-Net comprises two modules, i.e., the dynamic fusion module (DFM) and the self-learning dynamic restoration module (SLDRM). In order to solve the static fusion problem of existing methods, the DFM is proposed to learn the fusion weight dynamically. Specifically, the DFM comprises a classification module (CM) and an image fusion module (IFM), which determine whether and how to fuse source images. In addition, a unified fusion loss function is introduced to obtain more hidden features of infrared and visible images in complex environments. Therefore, the stumbling block of deep learning in image fusion, i.e., static fusion, is significantly mitigated. Extensive experiments demonstrate that the dynamic fusion optimization method neatly outperforms the state-of-the-art methods in most metrics. • A dynamic fusion optimization mechanism is proposed for the static fusion problem. • We develop a dynamic unsupervised learning image fusion network that performs adaptive fusion operations based on the image quality. • A unified image fusion loss function is provided for learning the dynamic fusion parameters of infrared and visible images. • Extensive experiments on 3 public datasets are carried out. Our proposal achieves the best overall performance against several existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Pseudo-set Frequency Refinement architecture for fine-grained few-shot class-incremental learning.
- Author
-
Pan, Zicheng, Zhang, Weichuan, Yu, Xiaohan, Zhang, Miaohua, and Gao, Yongsheng
- Subjects
- *
MACHINE learning , *LEARNING ability , *GENERALIZATION , *CLASSIFICATION - Abstract
Few-shot class-incremental learning was introduced to solve the model adaptation problem for new incremental classes with only a few examples while still remaining effective for old data. Although recent state-of-the-art methods make some progress in improving system robustness on common datasets, they fail to work on fine-grained datasets where inter-class differences are small. The problem is mainly caused by: (1) the overlapping of new data and old data in the feature space during incremental learning, which means old samples can be falsely classified as newly introduced classes and induce catastrophic forgetting phenomena; (2) lacking discriminative feature learning ability to identify fine-grained objects. In this paper, a novel Pseudo-set Frequency Refinement (PFR) architecture is proposed to tackle these problems. We design a pseudo-set training strategy to mimic the incremental learning scenarios so that the model can better adapt to novel data in future incremental sessions. Furthermore, separate adaptation tasks are developed by utilizing frequency-based information to refine the original features and address the above challenging problems. More specifically, the high and low-frequency components of the images are employed to enrich the discriminative feature analysis ability and incremental learning ability of the model respectively. The refined features are used to perform inter-class and inter-set analyses. Extensive experiments show that the proposed method consistently outperforms the state-of-the-art methods on four fine-grained datasets. • Developed a pseudo-set frequency refinement method for fine-grained FSCIL tasks. • Separated input image into high- and low-frequency components for better learning. • Enhanced the model's discrimination and generalization abilities with the new method. • Introduced a pseudo-set training strategy to mimic incremental learning scenarios. • Created FSCIL benchmarks using four fine-grained datasets for the first time. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. AMMD: Attentive maximum mean discrepancy for few-shot image classification.
- Author
-
Wu, Ji, Wang, Shipeng, and Sun, Jian
- Subjects
- *
IMAGE recognition (Computer vision) , *CLASSIFICATION - Abstract
Metric-based methods have attained promising performance for few-shot image classification. Maximum Mean Discrepancy (MMD) is a typical distance between distributions, requiring to compute expectations w.r.t. data distributions. In this paper, we propose Attentive Maximum Mean Discrepancy (AMMD) to measure the distances between query images and support classes for few-shot classification. Each query image is classified as the support class with minimal AMMD distance. The proposed AMMD assists MMD with distributions adaptively estimated by an Attention-based Distribution Generation Module (ADGM). ADGM is learned to put more mass on more discriminative features, which makes the proposed AMMD distance emphasize discriminative features and overlook spurious features. Extensive experiments show that our AMMD achieves competitive or state-of-the-art performance on multiple few-shot classification benchmark datasets. Code is available at https://github.com/WuJi1/AMMD. • We propose Attentive Maximum Mean Discrepancy (AMMD) metric in few-shot learning. • We use attention mechanisms to learn the importance of features in distributions. • Extensive experiments and analyses verify the effectiveness of the proposed AMMD. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Discovering attention-guided cross-modality correlation for visible–infrared person re-identification.
- Author
-
Yu, Hao, Cheng, Xu, Cheng, Kevin Ho Man, Peng, Wei, Yu, Zitong, and Zhao, Guoying
- Subjects
- *
INFRARED imaging , *LOGITS , *CLASSIFICATION , *ATTENTION - Abstract
Visible–infrared person re-identification (VI Re-ID) is an essential and challenging task. Existing studies mainly focus on learning the unified modality-invariant representations directly from visible and infrared images. However, it is hard to obtain the identity-aware patterns due to the co-existence of inter- and intra-modality discrepancies. In this paper, we propose a novel attention-guided cross-modality correlation method (AGCC) to achieve the modality-invariant and identity-discriminative representations for visible–infrared person Re-ID. Specifically, we introduce a modality-aware attention (MAA) mechanism to model the inter- and intra-modality variations, which generates attention masks of two modalities for preserving the most significant region and obtaining the discriminative patterns in each identity. Further, we present an attention-guided channel and spatial correlation scheme (AGCSC) to establish the attention-guided cross-modality correlation, which can bridge the gap between inter- and intra-modalities. Moreover, a novel joint-modality learning head (JMLH) is developed to promote the metric and mutual learning from both feature distribution and classification logit levels. Extensive experiments on two public SYSU-MM01 and RegDB datasets demonstrate the remarkable superiority of our method over the state of the arts. The implementation codes will be made available soon. [Display omitted] • A novel attention-guided cross-modality correlation approach for VI Re-ID. • A modality-aware attention mechanism is utilized to mine modality-shared regions and discriminative patterns. • An attention-guided channel and spatial correlation scheme is developed to relieve the modality discrepancies. • An effective joint-modality learning head is designed to promote metric and mutual learning in feature distribution and classification logits levels. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Learning a complex network representation for shape classification.
- Author
-
Ribas, Lucas C. and Bruno, Odemir M.
- Subjects
- *
COMPUTER vision , *CLASSIFICATION , *NAIVE Bayes classification , *IMAGE representation - Abstract
Shape contour is a key low-level characteristic, making shape description an important aspect in many computer vision problems, with several challenges such as variations in scale, rotation, and noise. In this paper, we introduce an approach for shape analysis and classification from binary images based on representations learned by applying Randomized Neural Networks (RNNs) on feature maps derived from a Complex Network (CN) framework. Our approach models the contour in a complex network and computes their topological measures using a dynamic evolution strategy. This evolution of the CN provides significant information into the physical aspects of the shape's contour. Therefore, we propose embedding the topological measures computed from the dynamics of the CN evolution into a matrix representation, which we have named the Topological Feature Map (TFM). Then, we employ the RNN to learn representations from the TFM through a sliding window strategy. The proposed representation is formed by the learned weights between the hidden and output layers of the RNN. Our experimental results show performance improvements in shape classification using the proposed method across two generic shape datasets. We also applied our approach to the recognition of plant leaves, achieving high performance in this challenging task. Furthermore, the proposed approach has demonstrated robustness to noise and invariance to transformations in scale and orientation of the shapes. • A complex representation for shape classification is proposed. • It is based on combining a complex network model and randomized neural networks. • Shape contours are modeled as complex networks. • Randomized neural networks learn the features from the modeled complex networks. • The Complex Representations are the output weights of the neural network. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Low-cost orthogonal basis-core extraction for classification and reconstruction using tensor ring.
- Author
-
Akhter, Suravi, Alam, Muhammad Mahbub, Islam, Md. Shariful, Momen, M. Arshad, and Shoyaib, Mohammad
- Subjects
- *
CLASSIFICATION , *COMPUTATIONAL complexity - Abstract
Tensor based methods have gained popularity for being able to represent multi-aspect real world data in a lower dimensional space. Among them, methods with orthogonal factors perform relatively better in classification. However, most of them cannot handle higher order data. Recently, Tensor Ring (TR) based methods are proposed to combat with the higher order issue more effectively focusing on both classification and reconstruction. A TR-based method with orthogonal cores performs reasonably well for a given error. However, its computational complexity is very high and might produce extra features. To solve these issues, in this paper, we propose a method named as Orthogonal basis-core extraction using Tensor Ring (OTR) that can facilitate better discrimination and reconstruction at a lower cost. To maintain the ring property, we also show, theoretically, that reshaping of the product of semi-orthogonal reshaped cores remains semi-orthogonal. Rigorous experiments over eighteen benchmark datasets from different fields demonstrate the superiority of OTR over state-of-the-art methods in terms of classification and reconstruction. • Orthogonal basis-core extraction using Tensor Ring (OTR) can deal with higher order. • It produces semi-orthogonal basis-cores at a lower cost. • We proof that the basis-cores and their circular products are semi-orthogonal. • OTR captures discriminative information reasonably well in a fewer number of steps. • Basis-cores obtained using OTR can be used for reconstruction of the original tensor. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Optimization of classifier chains via conditional likelihood maximization.
- Author
-
Sun, Lu and Kudo, Mineichi
- Subjects
- *
CLASSIFICATION , *PROBLEM solving , *MATHEMATICAL optimization , *FEATURE selection , *DEPENDENCE (Statistics) - Abstract
Multi-label classification associates an unseen instance with multiple relevant labels. In recent years, a variety of methods have been proposed to handle the multi-label problems. Classifier chains is one of the most popular multi-label methods because of its efficiency and simplicity. In this paper, we consider to optimize classifier chains from the viewpoint of conditional likelihood maximization. In the proposed unified framework, classifier chains can be optimized in either or both of two aspects: label correlation modeling and multi-label feature selection. In this paper we show that previous classifier chains algorithms are specified in the unified framework. In addition, previous information theoretic multi-label feature selection algorithms are specified with different assumptions on the feature and label spaces. Based on these analyses, we propose a novel multi-label method, k -dependence classifier chains with label-specific features, and demonstrate the effectiveness of the method. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
35. Time-series averaging using constrained dynamic time warping with tolerance.
- Author
-
Morel, Marion, Achard, Catherine, Kulpa, Richard, and Dubuisson, Séverine
- Subjects
- *
TIME series analysis , *CONSTRAINED optimization , *DATA mining , *SIGNAL processing , *CLASSIFICATION - Abstract
In this paper, we propose an innovative averaging of a set of time-series based on the Dynamic Time Warping (DTW). The DTW is widely used in data mining since it provides not only a similarity measure, but also a temporal alignment of time-series. However, its use is often restricted to the case of a pair of signals. In this paper, we propose to extend its application to a set of signals by providing an average time-series that opens a wide range of applications in data mining process. Starting with an existing well-established method called DBA (for DTW Barycenter Averaging), this paper points out its limitations and suggests an alternative based on a Constrained Dynamic Time Warping. Secondly, an innovative tolerance is added to take into account the admissible variability around the average signal. This new modeling of time-series is evaluated on a classification task applied on several datasets and results show that it outperforms state of the art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
36. Scatter matrix decomposition for jointly sparse learning.
- Author
-
Mo, Dongmei, Lai, Zhihui, Zhou, Jie, and Qinghua, Hu
- Subjects
- *
MATRIX decomposition , *S-matrix theory , *FISHER discriminant analysis , *PATTERN recognition systems , *ORTHOGONAL decompositions , *ORTHOGONAL matching pursuit , *FEATURE extraction , *COMPUTER vision , *EIGENANALYSIS - Abstract
• This paper solves the problem of orthogonal linear discriminant analysis (OLDA) from the novel viewpoint of scatter matrix orthogonal decomposition. • The method can obtain approximately orthogonal sparse discriminative vectors for dimensionality reduction and jointly sparse feature extraction. • Theoretical analysis shows that OLDA can be derived by the constrained scatter matrix decomposition. • The method outperforms several well-known LDA-based and sparse learning methods on four data sets (i.e., COIL100, USPS, ICADAR2003 and CMU PIE). Orthogonal Linear Discriminant Analysis (OLDA) based on generalized Eigen-equation is widely used in the field of computer vision and pattern recognition. However, the performance of OLDA for feature extraction and classification needs to be improved as it lacks sparsity for better interpretation of the features. Moreover, computing the orthogonal sparse projections based on LDA is very difficult and is still unsolved. To solve these problems, in this paper, we propose a method called Jointly Sparse Orthogonal Linear Discriminant Analysis (JSOLDA). Different from the existing OLDA, JSOLDA is proposed from a novel viewpoint of scatter matrix decomposition. Theoretical analysis shows that OLDA can be derived by the constrained scatter matrix decomposition. In addition, by imposing L 2,1 -norm on the penalty term, the proposed JSOLDA can obtain the jointly sparse orthogonal projections to perform feature extraction. We also design an iterative algorithm to obtain the optimal solution. Systematic theoretical analysis between the OLDA and JSOLDA are uncovered. Both of convergence and computational complexity are also discussed. Experimental results on four data sets (i.e., COIL100, USPS, ICADAR2003 and CMU PIE) indicate that JSOLDA outperforms several well-known LDA-based and L 2,1 -norm based methods. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
37. Fuzzy granular recurrence plot and quantification analysis: A novel method for classification.
- Author
-
He, Qian, Yu, Fusheng, Chang, Jiaqi, and Ouyang, Chenxi
- Subjects
- *
TIME series analysis , *NONLINEAR analysis , *JUDGMENT (Psychology) , *CLASSIFICATION , *PREDICATE calculus , *DYNAMICAL systems - Abstract
• We proposed the method of constructing a fuzzy granular recurrence plot (FGRP method). • We designed an FGRP method based classification model. • The experiment shows the good performance of the FGRP method in reducing the influence of noise. • The experiment presents the good performance of the FGRP method based classification model. Recently, recurrence plot (RP) and its quantification techniques have become an important research tool in nonlinear analysis. In the existing researches, an RP is directly established on a time series ignoring the influence of noise on data, which will affect our judgement on the dynamic properties of a system. To tackle the problem there, this paper proposes a novel recurrence plot, namely fuzzy granular recurrence plot (FGRP). An FGRP of a time series is built not directly on the time series itself but on its corresponding granular time series which is composed of fuzzy information granules. With specific capability, fuzzy information granules are used as building blocks of an FGRP to achieve high-level, compact and understandable signal models. In order to apply the FGRP method to time series classification tasks, an FGRP based classification model is designed in this paper. Subsequent experiments show that the FGRP of a time series can reduce the effect of noise, and the FGRP based classification model can improve the classification performance. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
38. A general framework for implementing distances for categorical variables.
- Author
-
Velden, Michel van de, D'Enza, Alfonso Iodice, Markos, Angelos, and Cavicchia, Carlo
- Subjects
- *
INDEPENDENT variables , *DATA analysis - Abstract
The degree to which objects differ from each other with respect to observations on a set of variables, plays an important role in many statistical methods. Many data analysis methods require a quantification of differences in the observed values which we can call distances. An appropriate definition of a distance depends on the nature of the data and the problem at hand. For distances between numerical variables, there exist many definitions that depend on the size of the observed differences. For categorical data, the definition of a distance is more complex as there is no straightforward quantification of the size of the observed differences. In this paper, we introduce a flexible framework for efficiently computing distances between categorical variables, supporting existing and new formulations tailored to specific contexts. In supervised classification, it enhances performance by integrating relationships between response and predictor variables. This framework allows measuring differences among objects across diverse data types and domains. • Measuring distances among objects is important in many statistical methods. • Defining distance accurately relies on the data's nature and the specific problem. • The definition of a categorical distance is not straightforward. • An efficient framework for implementing distances in categorical variables is needed. • The framework can help improve the performance of distance-based classifiers. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Re-decoupling the classification branch in object detectors for few-class scenes.
- Author
-
Hua, Jie, Wang, Zhongyuan, Zou, Qin, Xiao, Jinsheng, Tian, Xin, and Zhang, Yufei
- Subjects
- *
DETECTORS , *FEATURE extraction , *CLASSIFICATION , *BINARY sequences , *AUTONOMOUS vehicles , *SUPERVISED learning - Abstract
Few-class object detection is a critical task in numerous scenes, such as autonomous driving and intelligent surveillance. The current researches mainly focus on the correlation or decoupling between classification and regression subtasks in object detection. However, they rarely take advantage of the potential of re-decoupling the classification subtask. In this paper, we propose to re-decouple the classification branch in object detection for few-class scenes by reducing multi-classification features to multiple binary-classification features. Since multi-classification losses cannot supervise the network to learn decoupled binary-classification features, we introduce a single-class loss to supervise decoupled multiple binary-classification branches. In particular, we propose a basic feature degradation head (FD-Head) structure that decouples the classification branches and applies binary-classification loss to encourage each branch to learn only the degraded single-class features. In addition, based on the mutual exclusion between classes, we propose a mutual exclusion constraint (FD-Head-M) module to constrain the scores of all classes, promoting the detector performance. Finally, we replace the original convolution with more powerful feature extraction modules to form the enhanced FD-Head (FD-Head-E). Notably, our method can be used as a universal module and embedded into the existing object detectors to boost their performance. When applying our method to typical object detectors, it experimentally achieves performance gains of 1.2–2.2%, 1.7–2.5% on the KITTI-3, SeaShips datasets respectively. When using ResNet50 as the backbone network, our method gains an accuracy of 45.9% on the MS COCO dataset. • A feature degradation head for few-class scenes is proposed. • Guiding all sub-branches to learn more significant single-class features. • A mutually exclusive constraint module is proposed to reduce false detection. • Feature degradation head has also been extended to multi-class scenes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Weakly privileged learning with knowledge extraction.
- Author
-
Fu, Saiji, Dong, Tianyi, Wang, Zhaoxin, and Tian, Yingjie
- Subjects
- *
SUPERVISED learning , *EXTRACTION techniques , *PERFORMANCE standards , *MACHINE learning - Abstract
Learning using privileged information (LUPI) has shown promise in improving supervised learning by embedding additional knowledge. However, its reliance on the assumption of readily available privileged information may not hold true in practical scenarios due to limitations in access or confidentiality. To address these challenges, this paper presents a novel weakly privileged learning (WPL) framework, integrating knowledge extraction methods within the LUPI context. An effective strategy is proposed to implement the WPL framework, where knowledge extraction techniques generate a weight matrix as weak privileged information. Extensive experiments employing various existing knowledge extraction techniques demonstrate that the proposed WPL outperforms traditional supervised learning and approaches the performance of standard privileged learning where privileged information is given in advance. This research establishes WPL as a promising learning paradigm, addressing limitations in privileged information availability and advancing the field of machine learning in practical settings. • Introduce a novel weakly privileged learning framework (WPL). • Knowledge extraction methods are used to generate weak privileged information. • Extensive experiments verify the superiority of the proposed methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Generalization Memorization Machine with Zero Empirical Risk for Classification.
- Author
-
Wang, Zhen, Bai, Lan, and Shao, Yuanhai
- Subjects
- *
MEMORIZATION , *GENERALIZATION , *QUADRATIC programming , *SUPPORT vector machines - Abstract
Classifying the training data correctly without over-fitting is one of the goals in machine learning. In this paper, we propose a general Generalization Memorization Machine (GMM) to obtain zero empirical risk with better generalization. The widely applied loss-based learning models can be extended by the GMM to improve their memorization and generalization abilities. Specifically, we propose two new models based on the GMM, called Hard Generalization Memorization Machine (HGMM) and Soft Generalization Memorization Machine (SGMM). Both HGMM and SGMM obtain zero empirical risks with well generalization, and the SGMM further improves the capacity and applicability of HGMM. The optimization problems in the proposed models are quadratic programming problems and could be solved efficiently. Additionally, the recently proposed generalization memorization kernel and the corresponding support vector machine are the special cases of our SGMM. Experimental results demonstrate the effectiveness of the proposed HGMM and SGMM both on memorization and generalization. [Display omitted] • A SVM-type model (GMM) is proposed with zero empirical risk. • Any loss-based learning models could be improved by GMM directly. • The SVMm proposed by Vapnik is actually a special case of the proposed model. • The zero empirical risks of the proposed models are proved theoretically. • Experiments verify the zero empirical risk and competitive generalization of GMMs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. A thorough experimental comparison of multilabel methods for classification performance.
- Author
-
García-Pedrajas, Nicolás E., Cuevas-Muñoz, José M., Cerruela-García, Gonzalo, and de Haro-García, Aida
- Subjects
- *
DATA mining , *CLASSIFICATION , *RESEARCH personnel , *NAIVE Bayes classification - Abstract
Multilabel classification as a data mining task has recently attracted increasing interest from researchers. Many current data mining applications address problems with instances that belong to more than one class. These problems require the development of new, efficient methods. Advantageously using the correlation among different labels can provide better performance than methods that manage each label separately. In recent decades, many methods have been developed to deal with multilabel datasets, which makes it difficult to decide which method is the most appropriate for a given task. In this paper, we present the most comprehensive comparison carried out so far. We compare a total of 62 different methods and several configurations of each one for a total of 197 trained models. We also use a large set of problems comprising 65 datasets. In addition, we studied the efficiency of the methods considering six different classification performance metrics. Our results show that, although there are methods that repeatedly appear among the top-performing models, the best methods are closely related to the metric used for evaluating the performance. We also analyzed different aspects of the behavior of the methods. • We present a thorough comparison of multi-label methods. • The behavior depending on base classifiers is studied for many methods. • Six different metrics are used in the comparison. • Clustering of similar methods is carried out. • The behavior of the methods depending characteristics is carried out. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Compact network embedding for fast node classification.
- Author
-
Shen, Xiaobo, Ong, Yew-Soon, Mao, Zheng, Pan, Shirui, Liu, Weiwei, and Zheng, Yuhui
- Subjects
- *
VIRTUAL networks , *VECTOR spaces , *COMPUTATIONAL complexity , *CLASSIFICATION , *DEEP learning , *MATHEMATICAL convolutions - Abstract
• Discrete network embedding (DNE) is proposed for compact representation. DNE leverages hash code to represent node, and dramatically reduces computational and storage costs. • Deep discrete attributed network embedding (DDANE) is proposed to effectively leverage node attribute and network structure in attributed network. The proposed DDANE trains an improved graph convolutional network autoencoder to encode node attribute and network structure into latent discrete embedding. • Extensive experiments demonstrate the proposed methods exhibit lower storage and computational complexity than state-of-the-art network embedding methods, and achieve satisfactory accuracy. Network embedding has shown promising performance in real-world applications. The network embedding typically lies in a continuous vector space, where storage and computation costs are high, especially in large-scale applications. This paper proposes more compact representation to fulfill the gap. The proposed discrete network embedding (DNE) leverages hash code to represent node in Hamming space. The Hamming similarity between hash codes approximates the ground-truth similarity. The embedding and classifier are jointly learned to improve compactness and discrimination. The proposed multi-class classifier is further constrained to be discrete to expedite classification. In addition, this paper further extends DNE and proposes deep discrete attributed network embedding (DDANE) to learn compact deep embedding from more informative attributed network. From the perspective of generalized signal smoothing, the proposed DDANE trains an improved graph convolutional network autoencoder to effectively leverage node attribute and network structure. Extensive experiments on node classification demonstrate the proposed methods exhibit lower storage and computational complexity than state-of-the-art network embedding methods, and achieve satisfactory accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
44. 3D hand pose estimation from a single RGB image by weighting the occlusion and classification.
- Author
-
Mahdikhanlou, Khadijeh and Ebrahimnezhad, Hossein
- Subjects
- *
JOINTS (Anatomy) , *OCCLUSION (Chemistry) , *CLASSIFICATION , *PROBLEM solving - Abstract
• We propose a big and real RGB 3D hand pose estimation dataset that contains the 2D and 3D coordinates of the joints, status of the joints occlusion, classes of different gestures, and segmentation of the 15 parts of the hand. • We propose semantic segmentation of the hand parts for estimating the weight of the occlusion at hand joints. • We exploit the occlusion information for estimating the 3D hand pose from a single RGB image. • The proposed framework is a hybrid method of classification and estimation networks to increase the validity and the accuracy of the predicted hand poses. • Classification of the hand poses is based on the different aspects including gesture, the direction of the palm, and direction of the hand. In this paper, a new framework for 3D hand pose estimation using a single RGB image is proposed. The framework is composed of two blocks. The first block formulates the hand pose estimation as a classification problem. Since the human hand can perform numerous poses, the classification network needs a huge number of parameters. So, we propose to classify hand poses based on three different aspects, including hand gesture, hand direction, and palm direction. In this way, the number of parameters will be significantly reduced. The motivation behind the classification block is that the model deals with the image as a whole and extracts global features. Furthermore, the output of the classification model is a valid pose that does not include any unexpected angle at joints. The second block estimates the 3D coordinates of the hand joints and focuses more on the details of the image pattern. RGB-based 3D hand pose estimation is an inherently ill-posed problem due to the lack of depth information in the 2D image. We propose to use the occlusion status of the hand joints to solve this problem. The occlusion status of the joints has been labeled manually. Some joints are partially occluded, and we propose to compute the extent of the occlusion by semantic segmentation. The existing methods in this field mostly used synthetic datasets. But all the models proposed in this paper are trained on more than 50 K real images. Extensive experiments on our new dataset and two other benchmark datasets show that the proposed method can achieve good performance. We also analyze the validity of the predicted poses, and the results show that the classification block increases the validity of the poses. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
45. Subject-adaptive Integration of Multiple SICE Brain Networks with Different Sparsity.
- Author
-
Zhang, Jianjia, Zhou, Luping, and Wang, Lei
- Subjects
- *
DIAGNOSIS of brain diseases , *BIOLOGICAL neural networks , *CLASSIFICATION , *MANIFOLDS (Mathematics) , *INSTRUCTIONAL systems - Abstract
As a principled method for partial correlation estimation, sparse inverse covariance estimation (SICE) has been employed to model brain connectivity networks, which holds great promise for brain disease diagnosis. For each subject, the SICE method naturally leads to a set of connectivity networks with various sparsity. However, existing methods usually select a single network from them for classification and the discriminative power of this set of networks has not been fully exploited. This paper argues that the connectivity networks at different sparsity levels present complementary connectivity patterns and therefore they should be jointly considered to achieve high classification performance. In this paper, we propose a subject-adaptive method to integrate multiple SICE networks as a unified representation for classification. The integration weight is learned adaptively for each subject in order to endow the method with the flexibility in dealing with subject variations. Furthermore, to respect the manifold geometry of SICE networks, Stein kernel is employed to embed the manifold structure into a kernel-induced feature space, which allows a linear integration of SICE networks to be designed. The optimization of the integration weight and the classification of the integrated networks are performed via a sparse representation framework. Through our method, we provide a unified and effective network representation that is transparent to the sparsity level of SICE networks, and can be readily utilized for further medical analysis. Experimental study on ADHD and ADNI data sets demonstrates that the proposed integration method achieves notable improvement of classification performance in comparison with methods using a single sparsity level of SICE networks and other commonly used integration methods, such as Multiple Kernel Learning. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
46. Signatures verification based on PNN classifier optimised by PSO algorithm.
- Author
-
Porwik, Piotr, Doroz, Rafal, and Orczyk, Tomasz
- Subjects
- *
BIOMETRIC identification , *PARTICLE swarm optimization , *PATTERN perception , *DATABASES , *NEURAL circuitry - Abstract
In this paper, we propose a new biometric pattern recognition method. In classical techniques only features of raw objects are compared. In our approach we will use composed signatures’ features. Features of a signature are associated with appropriate similarity coefficients and individually matched to a given signature. If it is necessary, composed features can be reduced. In the proposed study the most promising results are obtained from Hotelling's approach. Data comprising the composed features allow to achieve higher signature recognition level, compared to unprocessed (raw) data. It is the greatest novelty of the paper—the proposed method of data reduction together with a new type of similarity measure gives a high signature recognition level for various classes of classifiers. Leaning on investigations carried out, the classifier based on the Probabilistic Neural Network (PNN) has been introduced. Optimal parameters of the PNN have been determined by means of the Particle Swarm Optimization (PSO) procedure. The two class PNN classifier demonstrates high efficiency, compared to other classifiers. The described signature verification system consists of three units where features are captured, composed features are prepared, data are reduced and verified. The results of the study carried on signatures of the SVC2004 and MCYT databases and demonstrate the effectiveness of the proposed approach in comparison with other methods from the literature. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
47. Cost-effectiveness of classification ensembles.
- Author
-
Bhardwaj, Manju, Bhatnagar, Vasudha, and Sharma, Kapil
- Subjects
- *
COST effectiveness , *CLASSIFICATION , *SUPERVISED learning , *EMPIRICAL research , *ALGORITHMS , *OBJECTIVE tests - Abstract
Ensemble pruning is an important task in supervised learning because of the performance and efficiency advantage it begets to predictive modelling. Performance based empirical comparison (primarily on accuracy) is the most common modus operandi for critical evaluation of ensembles pruned by different algorithms. Deep analysis of existing literature reveals that ensemble size is an ignored attribute while judging the quality of ensembles. In this paper, we argue that the cost-effectiveness of an ensemble is a function of both performance and size. Hence, equitable comparison of two ensembles must take into account both these attributes to judge their relative merits. Following this argument, we propose an objective function called accrual function which quantifies the difference in performance and size of two ensembles, to gauge their relative cost-effectiveness. The function can be parameterized and has nice mathematical properties. Semantic interpretations of these properties are delineated in the paper. Finally, we apply the accrual function on published results from selected publications and demonstrate its ability to beget clarity while comparing ensembles. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
48. A flexible hierarchical approach for facial age estimation based on multiple features.
- Author
-
Pontes, Jhony K., JrBritto, Alceu S., Fookes, Clinton, and Koerich, Alessandro L.
- Subjects
- *
HUMAN facial recognition software , *ESTIMATION theory , *FEATURE extraction , *WAVELETS (Mathematics) , *SIGNAL quantization , *ACCESS control - Abstract
Age estimation from facial images is increasingly receiving attention to solve age-based access control, age-adaptive targeted marketing, amongst other applications. Since even humans can be induced in error due to the complex biological processes involved, finding a robust method remains a research challenge today. In this paper, we propose a new framework for the integration of Active Appearance Models (AAM), Local Binary Patterns (LBP), Gabor wavelets (GW) and Local Phase Quantization (LPQ) in order to obtain a highly discriminative feature representation which is able to model shape, appearance, wrinkles and skin spots. In addition, this paper proposes a novel flexible hierarchical age estimation approach consisting of a multi-class Support Vector Machine (SVM) to classify a subject into an age group followed by a Support Vector Regression (SVR) to estimate a specific age. The errors that may happen in the classification step, caused by the hard boundaries between age classes, are compensated in the specific age estimation by a flexible overlapping of the age ranges. The performance of the proposed approach was evaluated on FG-NET Aging and MORPH Album 2 datasets and a mean absolute error ( MAE ) of 4.50 and 5.86 years was achieved respectively. The robustness of the proposed approach was also evaluated on a merge of both datasets and a MAE of 5.20 years was achieved. Furthermore, we have also compared the age estimation made by humans with the proposed approach and it has shown that the machine outperforms humans. The proposed approach is competitive with current state-of-the-art and it provides an additional robustness to blur, lighting and expression variance brought about by the local phase features. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
49. Joint Adaptive Median Binary Patterns for texture classification.
- Author
-
Hafiane, Adel, Palaniappan, Kannappan, and Seetharaman, Guna
- Subjects
- *
PATTERN recognition systems , *BINARY number system , *TEXTURE analysis (Image processing) , *CLASSIFICATION , *PIXELS - Abstract
This paper addresses the challenging problem of the recognition and classification of textured surfaces given a single instance acquired under unknown pose, scale and illumination conditions. We propose a novel texture descriptor, the Adaptive Median Binary Pattern (AMBP) based on an adaptive analysis window of local patterns. The principal idea of the AMBP is to convert a small local image patch to a binary pattern using adaptive threshold selection that switches between the central pixel value as used in the Local Binary Pattern (LBP) and the median as in Median Binary Pattern (MBP), but within a variable sized analysis window depending on the local microstructure of the texture. The variability of the local adaptive window is included as joint information to increase the discriminative properties. A new multiscale scheme is also proposed in this paper to handle the texture resolution problem. AMBP is evaluated in relation to other recent binary pattern techniques and many other texture analysis methods on three large texture corpora with and without noise added, CUReT, Outex_TC00012 and KTH_TIPS2. Generally, the proposed method performs better than the best state-of-the-art techniques in the noiseless case and significantly outperforms all of them in the presence of impulse noise. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
50. Integrated convolutional neural networks for joint super-resolution and classification of radar images.
- Author
-
Sharma, Rahul, Deka, Bhabesh, Fusco, Vincent, and Yurduseven, Okan
- Subjects
- *
CONVOLUTIONAL neural networks , *OBJECT recognition (Computer vision) , *COMPUTER vision , *SYNTHETIC aperture radar , *DEEP learning , *IMAGE recognition (Computer vision) , *RANDOM noise theory , *MULTISPECTRAL imaging - Abstract
Deep learning techniques have been widely used for two-dimensional (2D) and three-dimensional (3D) computer vision problems, such as object detection, super-resolution (SR) and classification to name a few. Radar images suffer from poor resolution as compared to optical images, hence developing a high-accuracy model to solve computer vision problems, such as a classifier, is a challenge. This is because of the lack of high-frequency details in the input images which makes it difficult for the classifier model to generate accurate predictions. Ways of addressing this challenge include training the learning model with a large dataset or using a more complicated model, such as deeper layer architecture. However, employing such approaches might result in the overfitting of the model, where the model might not generalize well on previously unseen data. Also, generating a large dataset for training the model is a challenging task, especially in the case of radar images. An alternate solution for achieving high accuracy in radar classification problems is provided in this paper wherein a CNN-enabled super-resolution (SR) model is integrated with the classifier model. The SR model is trained to generate high-resolution (HR) millimeter-wave (mmW) images from any input low-resolution (LR) mmW images. These resolved images from the SR model will be used by the classifier model to classify the input images into appropriate classes, consisting of threat and non-threat objects. The training data for the dual CNN layers are generated using a numerical model of a near-field coded-aperture computational imaging (CI) system. This trained dual CNN model is tested with simulated data generated from the CI numerical model wherein a high classification accuracy of 95% and a fast inference time of 0.193 s are obtained, making it suitable for real-time automated threat classification applications. For fair comparison, the developed CNN model is also validated with experimentally generated reconstruction data, in which case, a classification accuracy of 94% is obtained. • An integrated convolutional neural network is developed which jointly performs super-resolution and classification tasks. The super-resolution part enhances the resolution of millimeter-wave input images and the classifier part predicts a class for the enhanced image. • The super-resolution part of the model enhances both the cross-range and range resolutions of the input images. In addition, the model uses both the real and imaginary parts of the reconstruction data to perform the super-resolution task. • The super-resolution and the classifier models are trained on synthesized data instead of experimentally generated data. To offer a realistic approach, additive Gaussian noise is incorporated into the synthesized data. • Validation of the integrated model with both synthesized and experimental data shows a real-time accurate super-resolution and multi-label classification performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.