5,467 results
Search Results
2. Invited paper: Automatic speech recognition: History, methods and challenges
- Author
-
O’Shaughnessy, Douglas
- Subjects
- *
AUTOMATIC speech recognition , *SPEECH perception , *COMPUTER input-output equipment , *PATTERN perception - Abstract
Abstract: The field of automatic speech recognition (ASR) is discussed from the viewpoint of pattern recognition (PR). This tutorial examines the problem area, its methods, successes and failures, focusing on the nature of the speech signal and techniques to accomplish useful data reduction. Comparison is made with other areas of PR. Suggestions are given for areas of future progress. [Copyright &y& Elsevier]
- Published
- 2008
- Full Text
- View/download PDF
3. Filtering segmentation cuts for digit string recognition
- Author
-
Vellasques, E., Oliveira, L.S., Britto, A.S., Koerich, A.L., and Sabourin, R.
- Subjects
- *
INFORMATION filtering , *PAPER , *HYPOTHESIS , *TRAILS - Abstract
Abstract: In this paper we propose a method to evaluate segmentation cuts for handwritten touching digits. The idea of this method is to work as a filter in segmentation-based recognition system. This kind of system usually rely on over-segmentation methods, where several segmentation hypotheses are created for each touching group of digits and then assessed by a general-purpose classifier. The novelty of the proposed methodology lies in the fact that unnecessary segmentation cuts can be identified without any attempt of classification by a general-purpose classifier, reducing the number of paths in a segmentation graph, what can consequently lead to a reduction in computational cost. An cost-based approach using ROC (receiver operating characteristics) was deployed to optimize the filter. Experimental results show that the filter can eliminate up to 83% of the unnecessary segmentation hypothesis and increase the overall performance of the system. [Copyright &y& Elsevier]
- Published
- 2008
- Full Text
- View/download PDF
4. A dynamic overproduce-and-choose strategy for the selection of classifier ensembles
- Author
-
Dos Santos, Eulanda M., Sabourin, Robert, and Maupin, Patrick
- Subjects
- *
BALANCE of trade , *PAPER , *SUPPLY & demand , *BUSINESS cycles - Abstract
Abstract: The overproduce-and-choose strategy, which is divided into the overproduction and selection phases, has traditionally focused on finding the most accurate subset of classifiers at the selection phase, and using it to predict the class of all the samples in the test data set. It is therefore, a static classifier ensemble selection strategy. In this paper, we propose a dynamic overproduce-and-choose strategy which combines optimization and dynamic selection in a two-level selection phase to allow the selection of the most confident subset of classifiers to label each test sample individually. The optimization level is intended to generate a population of highly accurate candidate classifier ensembles, while the dynamic selection level applies measures of confidence to reveal the candidate ensemble with the highest degree of confidence in the current decision. Experimental results conducted to compare the proposed method to a static overproduce-and-choose strategy and a classical dynamic classifier selection approach demonstrate that our method outperforms both these selection-based methods, and is also more efficient in terms of performance than combining the decisions of all classifiers in the initial pool. [Copyright &y& Elsevier]
- Published
- 2008
- Full Text
- View/download PDF
5. Constraints on general motions for camera calibration with one-dimensional objects
- Author
-
Qi, Fei, Li, Qihe, Luo, Yupin, and Hu, Dongcheng
- Subjects
- *
PAPER , *CALIBRATION , *EXAMPLE , *FEASIBILITY studies - Abstract
Abstract: This paper focuses on two problems in camera calibration with one-dimensional (1D) objects: (a) to find out the general motion patterns well suited for solving the calibration problem, and (b) to improve the robustness and accuracy of the method. Firstly, a sufficient and necessary condition for the solvability of 1D calibration with general motions is proved. Then the special motion of tossing a 1D object is provided as an example to illustrate the correctness and feasibility of this condition. After that some practical issues on obtaining the solution are inspected. By avoiding singularities, the precision and robustness of the method are improved: the relative mean errors are reduced to less than 5% at the noise level of one pixel which surpasses the state-of-the-art methods of the same category. [Copyright &y& Elsevier]
- Published
- 2007
- Full Text
- View/download PDF
6. Exploring global information for session-based recommendation
- Author
-
Wang, Ziyang, Wei, Wei, Zou, Ding, Liu, Yifan, Li, Xiao-Li, Mao, Xian-Ling, and Qiu, Minghui
- Published
- 2024
- Full Text
- View/download PDF
7. Automatically classifying non-functional requirements using deep neural network
- Author
-
Li, Bing and Nong, Xiuwen
- Published
- 2022
- Full Text
- View/download PDF
8. Call for Papers: Similarity-based Pattern Recognition
- Published
- 2004
- Full Text
- View/download PDF
9. KPCA for semantic object extraction in images
- Author
-
Li, Jing, Li, Xuelong, and Tao, Dacheng
- Subjects
- *
ALGORITHMS , *COLOR , *PAPER , *KERNEL functions - Abstract
Abstract: In this paper, we kernelize conventional clustering algorithms from a novel point of view. Based on the fully mathematical proof, we first demonstrate that kernel KMeans (KKMeans) is equivalent to kernel principal component analysis (KPCA) prior to the conventional KMeans algorithm. By using KPCA as a preprocessing step, we also generalize Gaussian mixture model (GMM) to its kernel version, the kernel GMM (KGMM). Consequently, conventional clustering algorithms can be easily kernelized in the linear feature space instead of a nonlinear one. To evaluate the newly established KKMeans and KGMM algorithms, we utilized them to the problem of semantic object extraction (segmentation) of color images. Based on a series of experiments carried out on a set of color images, we indicate that both KKMeans and KGMM can offer more elaborate output than the conventional KMeans and GMM, respectively. [Copyright &y& Elsevier]
- Published
- 2008
- Full Text
- View/download PDF
10. Sharing secrets in stego images with authentication
- Author
-
Chang, Chin-Chen, Hsieh, Yi-Pei, and Lin, Chia-Hsuan
- Subjects
- *
CRYPTOGRAPHY , *AUTHENTICATION (Law) , *PAPER , *RELIABILITY (Personality trait) - Abstract
Abstract: Recently, Lin and Tsai and Yang et al. proposed secret image sharing schemes with steganography and authentication, which divide a secret image into the shadows and embed the produced shadows in the cover images to form the stego images so as to be transmitted to authorized recipients securely. In addition, these schemes also involve their authentication mechanisms to verify the integrity of the stego images such that the secret image can be restored correctly. Unfortunately, these schemes still have two shortcomings. One is that the weak authentication cannot well protect the integrity of the stego images, so the secret image cannot be recovered completely. The other shortcoming is that the visual quality of the stego images is not good enough. To overcome such drawbacks, in this paper, we propose a novel secret image sharing scheme combining steganography and authentication based on Chinese remainder theorem (CRT). The proposed scheme not only improves the authentication ability but also enhances the visual quality of the stego images. The experimental results show that the proposed scheme is superior to the previously existing methods. [Copyright &y& Elsevier]
- Published
- 2008
- Full Text
- View/download PDF
11. A two-codebook combination and three-phase block matching based image-hiding scheme with high embedding capacity
- Author
-
Hsieh, Yi-Pei, Chang, Chin-Chen, and Liu, Li-Jen
- Subjects
- *
SUBSTITUTION (Technology) , *QUALITY , *PAPER , *IMAGE - Abstract
Abstract: Image hiding is a technique that embeds the important images into a cover image such that the important images are imperceptible and can be securely transmitted to the receiver. In such research, the common goals are to enlarge the embedding capacity as much as possible since the visual quality of the cover image is degraded slightly and to keep high visual quality of the important images when they are extracted from the stego image. In this paper, we propose an image-hiding method based on the two-codebook combination, the three-phase block matching procedure, and the modulus substitution. The proposed method can achieve these benefits: (1) multiple, relatively large important images can be embedded into a relatively small cover image; (2) the quality of the stego image after embedding the secret data is not distorted significantly; (3) the important images have an acceptable visual quality after they are extracted. The experimental results also show that the proposed method is more flexible than previous methods. [Copyright &y& Elsevier]
- Published
- 2008
- Full Text
- View/download PDF
12. Recognition of degraded characters using dynamic Bayesian networks
- Author
-
Likforman-Sulem, Laurence and Sigelle, Marc
- Subjects
- *
BAYESIAN analysis , *MARKOV processes , *PAPER , *COUPLINGS (Gearing) - Abstract
Abstract: In this paper, we investigate the application of dynamic Bayesian networks (DBNs) to the recognition of degraded characters. DBNs are an extension of one-dimensional hidden Markov models (HMMs) which can handle several observation and state sequences. In our study, characters are represented by the coupling of two HMM architectures into a single DBN model. The interacting HMMs are a vertical HMM and a horizontal HMM whose observable outputs are the image columns and image rows, respectively. Various couplings are proposed where interactions are achieved through the causal influence between state variables. We compare non-coupled and coupled models on two tasks: the recognition of artificially degraded handwritten digits and the recognition of real degraded old printed characters. Our models show that coupled architectures perform more accurately on degraded characters than basic HMMs, the linear combination of independent HMM scores, as well as discriminative methods such as support vector machines (SVMs). [Copyright &y& Elsevier]
- Published
- 2008
- Full Text
- View/download PDF
13. Pattern Recognition: Call for Papers Special Issue on Agent based Computer Vision.
- Author
-
Rosin, Paul, Rana, Omer, and of Cardiff Submission Deadline:, University
- Published
- 2002
- Full Text
- View/download PDF
14. Imprecise Gaussian discriminant classification
- Author
-
Carranza Alarcón, Yonatan Carlos and Destercke, Sébastien
- Published
- 2021
- Full Text
- View/download PDF
15. Optical correlator for recognizing characters printed on paper
- Author
-
Smolińska, B.
- Published
- 1983
- Full Text
- View/download PDF
16. AlignedReID++: Dynamically matching local information for person re-identification
- Author
-
Luo, Hao, Jiang, Wei, Zhang, Xuan, Fan, Xing, Qian, Jingjing, and Zhang, Chi
- Published
- 2019
- Full Text
- View/download PDF
17. Fast main density peak clustering within relevant regions via a robust decision graph
- Author
-
Guan, Junyi, Li, Sheng, Zhu, Jinhui, He, Xiongxiong, and Chen, Jiajia
- Published
- 2024
- Full Text
- View/download PDF
18. On the classification of dynamical data streams using novel “Anti-Bayesian” techniques
- Author
-
Hammer, Hugo Lewi, Yazidi, Anis, and Oommen, B. John
- Published
- 2018
- Full Text
- View/download PDF
19. GhostFormer: Efficiently amalgamated CNN-transformer architecture for object detection
- Author
-
Xie, Xin, Wu, Dengquan, Xie, Mingye, and Li, Zixi
- Published
- 2024
- Full Text
- View/download PDF
20. Synthetic unknown class learning for learning unknowns.
- Author
-
Jang, Jaeyeon
- Subjects
- *
POSSIBILITY - Abstract
This paper addresses the open set recognition (OSR) problem, where the goal is to correctly classify samples of known classes while detecting unknown samples to reject. In the OSR problem, "unknown" is assumed to have infinite possibilities because we have no knowledge about unknowns until they emerge. Intuitively, the more an OSR system explores the possibilities of unknowns, the more likely it is to detect unknowns. Even though several generative OSR models have been proposed to explore more by generating synthetic samples and learning them as unknowns, the generated samples are limited to a small subspace of the known classes. Thus, this paper proposes a novel synthetic unknown class learning method that constantly generates unknown-like samples while maintaining diversity between the generated samples. By learning the unknown-like samples and known samples in an alternating manner, the proposed method can not only experience diverse synthetic unknowns but also reduce overgeneralization with respect to known classes. Experiments on several benchmark datasets show that the proposed method significantly outperforms other state-of-the-art approaches by generating diverse realistic unknown samples. • A novel generative open set recognition (OSR) model is developed. • The limitation of generative OSR models that generate limited samples is addressed. • A new learning technique generates realistic unknown-like samples and learns them. • Knowledge distillation is employed to reduce overgeneralization on known classes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. DynamicKD: An effective knowledge distillation via dynamic entropy correction-based distillation for gap optimizing.
- Author
-
Zhu, Songling, Shang, Ronghua, Yuan, Bo, Zhang, Weitong, Li, Wenjie, Li, Yangyang, and Jiao, Licheng
- Subjects
- *
DISTILLATION , *KNOWLEDGE gap theory , *ENTROPY , *ENTROPY (Information theory) - Abstract
The knowledge distillation uses a high-performance teacher network to guide the student network. However, the performance gap between the teacher and student networks can affect the student's training. This paper proposes a novel knowledge distillation algorithm based on dynamic entropy correction, which adjusts the student instead of the teacher to reduce the gap. Firstly, the effect of changing the output entropy (short for output information entropy) on the distillation loss in the student is analyzed in theory. This paper shows that correcting the output entropy can reduce the gap. Then, a knowledge distillation algorithm based on dynamic entropy correction is created, which can correct the output entropy in real-time with an entropy controller updated dynamically by the distillation loss. The proposed algorithm is validated on the CIFAR100, ImageNet, and PASCAL VOC 2007. The comparison with various state-of-the-art distillation algorithms shows impressive results, especially in the experiment on the CIFAR100 regarding teacher–student pair resnet32x4–resnet8x4. The proposed algorithm raises 2.64 points over the traditional distillation algorithm and 0.87 points over the state-of-the-art algorithm CRD in classification accuracy, demonstrating its effectiveness and efficiency. • This paper proposes a novel knowledge distillation algorithm called DynamicKD. • DynamicKD designs an entropy controller to reduce the distillation gap in real time. • DynamicKD uses the dynamic entropy correction to reduce the learning difficulty. • DynamicKD uses a single entropy controller to help students' learning. • Experimental results show the effectiveness and efficiency of DynamicKD. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Deep and wide nonnegative matrix factorization with embedded regularization.
- Author
-
Moayed, Hojjat and Mansoori, Eghbal G.
- Subjects
- *
MATRIX decomposition , *NONNEGATIVE matrices , *PATTERN recognition systems , *TIME complexity , *COMPUTATIONAL complexity , *NEURONS , *DEEP learning , *FEATURE extraction - Abstract
• Paper proposes Deep and Wide Nonnegative Matrix Factorization with embedded regularization (DWNMF) as an end-to-end model. • It prevents overfitting via embedded regularization, and decreases vanishing gradient via training layers independently. • Model size can grow incrementally to achieve the performance. It saves memory since it only holds each layer's parameters. • Model size can grow incrementally to achieve the performance. It saves memory since it only holds each layer's parameters. • Experimental results showed that DWNMF performs better than end-to-end feature learning models in complexity and CPU time. End-to-end learning is an advanced framework in deep learning. It combines feature extraction, followed by pattern recognition (classification, clustering, etc.) in a unified learning structure. However, these deep networks face several challenges such as overfitting, vanishing gradient, computational complexity, information loss in layers, and weak robustness to noisy data/features. To address these challenges, this paper presents Deep and Wide Nonnegative Matrix Factorization (DWNMF) with embedded regularization for the feature extraction stage of the end-to-end models. DWNMF aims to identify more robust features while preventing overfitting via embedding regularization. For this purpose, DWNMF integrates input data with its noisy versions as diverse augmented channels. Then, the features in all channels are extracted in parallel using distinct network branches. The parameters of this model learn the intrinsic hierarchical features in the channels of complex data objects. Finally, the extracted features in different channels are aggregated in a single feature space to perform the classification task. To embed regularization in the DWNMF model, some NMF neurons in the layers are substituted by random neurons to increase the stability and robustness of the extracted features. Experimental results confirm that the DWNMF model extracts more robust features, prevents overfitting, and achieves better classification accuracy compared to state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Dual GroupGAN: An unsupervised four-competitor (2V2) approach for video anomaly detection.
- Author
-
Sun, Zhe, Wang, Panpan, Zheng, Wang, and Zhang, Meng
- Subjects
- *
ANOMALY detection (Computer security) , *INTRUSION detection systems (Computer security) , *VIDEOS , *GLOBAL method of teaching - Abstract
• We proposed a dual GroupGAN network constructed by a SENet-based four-competitor (2V2), which leverages the predicted video frame as input to the reconstruction network to amplify the reconstruction error and improve the detection of anomalies. • The proposed approach introduced in this paper can effectively enhance crucial spatial-temporal features presented in video frames, thereby facilitating better preservation of normal patterns in memory. • The effectiveness of the proposed approach is demonstrated through extensive experiments on three standard public VAD datasets. In response to the issues of overgeneralization in reconstruction-based methods and noise sensitivity in prediction-based methods for video anomaly detection, this paper proposes a novel unsupervised video anomaly detection approach using dual GroupGAN, refers to a four-competitor (2V2), based on channel attention mechanism. Our appraoch incorporates a channel attention mechanism into two generators, namely the SE-U-Net and SE-VAE, which respectively serve as the prediction and reconstruction networks. The SE-U-Net captures essential spatio-temporal features and automatically calibrates the channel dimension, while the SE-VAE learns global features from associated video frames. A weighting strategy is used to fuse the anomaly scores of the two networks and balance their emphasis on spatio-temporal feature representation. To wrap up, the proposed prediction network (SE-U-Net) is resistant to overgeneralization and improves quality of the reconstruction network (SE-VAE) when using the prediction frame as the input of SE-VAE. Also, the SE-VAE enhances predicted future frames from normal events, thereby increasing the robustness of the SE-U-Net. Experimental results from UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets demonstrate the effectiveness of the proposed approach both qualitatively and quantitatively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Incremental convolutional transformer for baggage threat detection.
- Author
-
Hassan, Taimur, Hassan, Bilal, Owais, Muhammad, Velayudhan, Divya, Dias, Jorge, Ghazal, Mohammed, and Werghi, Naoufel
- Subjects
- *
DEEP learning , *LUGGAGE , *MACHINE learning - Abstract
Detecting cluttered and overlapping contraband items from baggage scans is one of the most challenging tasks, even for human experts. Recently, considerable literature has grown up around the theme of deep learning-based X-ray screening for localizing contraband data. However, the existing threat detection systems are still vulnerable to high occlusion, clutter, and concealment. Furthermore, they require exhaustive training routines on large-scale and well-annotated data in order to produce accurate results. To overcome the above-mentioned limitations, this paper presents a novel convolutional transformer system that recognizes different overlapping instances of prohibited objects in complex baggage X-ray scans via a distillation-driven incremental instance segmentation scheme. Furthermore, unlike its competitors, the proposed framework allows an incremental integration of new item instances while avoiding costly training routines. In addition to this, the proposed framework also outperforms state-of-the-art approaches by achieving a mean average precision score of 0.7896, 0.5974, and 0.7569 on publicly available GDXray, SIXray, and OPIXray datasets for detecting concealed and cluttered baggage threats. • This paper presents a novel incremental convolutional transformer model. • A β hyperparameter is introduced in the paper to control catastrophic forgetting. • A unique segmentation scheme is proposed to extract cluttered object instances. • The proposed system is thoroughly tested on three public X-ray datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Cross co-teaching for semi-supervised medical image segmentation.
- Author
-
Zhang, Fan, Liu, Huiying, Wang, Jinjiang, Lyu, Jun, Cai, Qing, Li, Huafeng, Dong, Junyu, and Zhang, David
- Subjects
- *
DIAGNOSTIC imaging , *TEACHING teams - Abstract
Excellent performance has been achieved on semi-supervised medical image segmentation, but existing algorithms perform relatively poorly for objects with variable topologies and weak boundaries. In this paper, we propose a novel cross co-teaching framework, called Cross-structure-task Collaborative Teaching (CroCT), which not only can effectively handle variable topologies, but also strengthens the learning for weak boundaries of unlabeled data. Specifically, a new cross-structure-task collaborative teaching mechanism is developed based on our designed "E-Net" structure composed of a shared encoder and two decoder branches with distinct learning paradigms, which asks these two branches to regress topology-aware signed distance functions and densely-predicted segmentation masks for each other. Powered by the collaboration across different structural biases and sequence-related tasks, our CroCT can extract more discriminative yet complementary representations from abundant raw medical data to promote the consistency learning generalization, further boosting the performance for tackling highly diverse shapes and topological changes intra-/inter-slices. Besides, it guarantees the diversities from multi-levels, i.e., structure and task perspectives, to preclude prediction uncertainty. In addition, a novel adaptive boundary enhancing (ABE) module is proposed to introduce compact annularly enhanced boundary features into semi-supervised training, which significantly improves weak boundary perception ability for unlabeled data while facilitating collaborative teaching for efficiently propagating complementary knowledge across different branches. The extensive experiments on three challenging medical benchmarks, employing different labeled settings, demonstrate the superiority of our CroCT over recent state-of-the-art competitors. • A novel cross-structure-task collaborative teaching framework is presented. • ABE facilitates the efficient collaboration and fusion of complementary knowledge. • Variable topologies and weak boundary issues in SSMIS are well solved in this paper. • Results on three challenging SSMIS benchmarks confirm the superiority of our CroCT. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. SceneFake: An initial dataset and benchmarks for scene fake audio detection.
- Author
-
Yi, Jiangyan, Wang, Chenglong, Tao, Jianhua, Zhang, Chu Yuan, Fan, Cunhang, Tian, Zhengkun, Ma, Haoxin, and Fu, Ruibo
- Subjects
- *
SPEECH enhancement , *SOURCE code , *SIGNAL-to-noise ratio , *PROSODIC analysis (Linguistics) - Abstract
Many datasets have been designed to further the development of fake audio detection. However, fake utterances in previous datasets are mostly generated by altering timbre, prosody, linguistic content or channel noise of original audio. These datasets leave out a scenario, in which the acoustic scene of an original audio is manipulated with a forged one. It will pose a major threat to our society if some people misuse the manipulated audio with malicious purpose. Therefore, this motivates us to fill in the gap. This paper proposes such a dataset for scene fake audio detection named SceneFake, where a manipulated audio is generated by only tampering with the acoustic scene of an real utterance by using speech enhancement technologies. Some scene fake audio detection benchmark results on the SceneFake dataset are reported in this paper. In addition, an analysis of fake attacks with different speech enhancement technologies and signal-to-noise ratios are presented in this paper. The results indicate that scene fake utterances cannot be reliably detected by baseline models trained on the ASVspoof 2019 dataset. Although these models perform well on the SceneFake training set and seen testing set, their performance is poor on the unseen test set. The dataset 2 2 https://zenodo.org/record/7663324#.Y%5fXKMuPYuUk. and benchmark source codes 3 3 https://github.com/ADDchallenge/SceneFake. are publicly available. • This paper proposes a new problem: scene fake audio detection. • This is the first attempt to pose such an audio fake attack using speech enhancement. • This paper designs a dataset and provides benchmarks for scene fake audio detection. • The dataset provides speech enhancement technology information of fake utterances. • The dataset and benchmark source codes are publicly available. • The dataset will further foster research on fake audio detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Graph embedding orthogonal decomposition: A synchronous feature selection technique based on collaborative particle swarm optimization.
- Author
-
Zhong, Jingyu, Shang, Ronghua, Xu, Songhua, and Li, Yangyang
- Subjects
- *
ORTHOGONAL decompositions , *FEATURE selection , *PARTICLE swarm optimization - Abstract
• This paper proposes a synchronous feature selection technique based on graph-embedded cluster label orthogonal decomposition and collaborative particle swarm optimization (GOD-cPSO). • GOD-cPSO extends the feature selection framework of clustering label orthogonal decomposition by graph embedding. • The l 2,1-2-norm with strong global convergence is extended to the graph embedding clustering label orthogonal decomposition framework. • The local structure preserving of low-dimensional manifolds is integrated into the graph-embedded clustering label orthogonal decomposition framework. • GOD-cPSO synchronously guides the graph-embedding clustering labeling orthogonal decomposition framework for feature selection through collaborative particle swarm optimization. In unsupervised feature selection, the clustering label matrix has the ability to distinguish between projection clusters. However, the latent geometric structure of the clustering labels is often ignored. In addition, the optimal sub-feature selection performance of feature selection techniques relies greatly on the choice of balanced parameters, and the selection range of most technical parameters is limited and fixed. To solve the above-mentioned problems, this paper proposes a synchronous feature selection technique based on graph-embedded cluster label orthogonal decomposition and collaborative particle swarm optimization (GOD-cPSO). First, GOD-cPSO extends the feature selection framework of clustering label orthogonal decomposition by graph embedding to retain the latent geometric structure of clustering labels, thus maintaining the correlation between clustered sample labels. Then, the l 2,1-2 -norm with strong global convergence is extended to the graph embedding clustering label orthogonal decomposition framework. By imposing this non-convex constraint, GOD-cPSO can achieve low-dimensional sparse and low-redundant sub-features. In addition, the local structure preserving of low-dimensional manifolds is integrated into the graph-embedded clustering label orthogonal decomposition framework to obtain good cluster separation and effectively maintain the latent local structure of the data. Finally, to ensure the adaptive parameter selection over a large range, GOD-cPSO synchronously guides the graph-embedding clustering labeling orthogonal decomposition framework for feature selection through collaborative particle swarm optimization. GOD-cPSO has synchronous parameter optimization and feature selection and picks parameters in a larger range. Comprehensive numerical experiments are performed on nine datasets to test the validity of the GOD-cPSO. The experimental results demonstrate that the sub-features selected by the GOD-cPSO have stronger discriminative power and are superior to other techniques in the clustering assignments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Multi-target label backdoor attacks on graph neural networks.
- Author
-
Wang, Kaiyang, Deng, Huaxin, Xu, Yijia, Liu, Zhonglin, and Fang, Yong
- Subjects
- *
GRAPH neural networks , *POISONS - Abstract
Graph neural networks have been shown to have characteristics that make them susceptible to backdoor attacks, and many recent works have proposed feasible graph backdoor attack methods. However, existing graph backdoor attack methods only target one-to-one attack types and lack graph backdoor attack methods that can address one-to-many attack requirements. This paper is the first research work on one-to-many type graph backdoor attacks and proposes the backdoor attack method MLGB, which can achieve multi-target label attacks for GNN node classification tasks. We designed encoding mechanisms to allow MLGB to customize triggers for different target labels and ensure differentiation between triggers for different target labels through loss functions. Additionally, we designed an innovative poisoned node selection method to improve the efficiency of MLGB's attacks further. Extensive experiments were conducted to validate MLGB's effectiveness across multiple datasets and model architectures, demonstrating its robustness against graph backdoor attack defense mechanisms. Furthermore, ablation experiments and explainability analyses were conducted to provide deeper insights into MLGB. Our work reveals that graph neural networks are also vulnerable to one-to-many type backdoor attacks, which is important for practitioners to understand model risks comprehensively. • To our knowledge, this paper is the first work in the field of one-to-many backdoor attacks on graph neural networks. • We propose MLGB, a graph backdoor attack method that enables attackers to set multiple target labels simultaneously. • We design a poison node selection method to enhance the efficiency of graph backdoor attacks. • We design an encoding mechanism and loss functions tailored for multi-target requirements. • We perform large-scale experiments, and comprehensively evaluate the effectiveness and stealthiness of MLGB. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Federated learning for medical image analysis: A survey.
- Author
-
Guan, Hao, Yap, Pew-Thian, Bozoki, Andrea, and Liu, Mingxia
- Subjects
- *
FEDERATED learning , *COMPUTER-assisted image analysis (Medicine) , *IMAGE analysis , *MACHINE learning , *DIAGNOSTIC imaging - Abstract
Machine learning in medical imaging often faces a fundamental dilemma, namely, the small sample size problem. Many recent studies suggest using multi-domain data pooled from different acquisition sites/centers to improve statistical power. However, medical images from different sites cannot be easily shared to build large datasets for model training due to privacy protection reasons. As a promising solution, federated learning, which enables collaborative training of machine learning models based on data from different sites without cross-site data sharing, has attracted considerable attention recently. In this paper, we conduct a comprehensive survey of the recent development of federated learning methods in medical image analysis. We have systematically gathered research papers on federated learning and its applications in medical image analysis published between 2017 and 2023. Our search and compilation were conducted using databases from IEEE Xplore, ACM Digital Library, Science Direct, Springer Link, Web of Science, Google Scholar, and PubMed. In this survey, we first introduce the background of federated learning for dealing with privacy protection and collaborative learning issues. We then present a comprehensive review of recent advances in federated learning methods for medical image analysis. Specifically, existing methods are categorized based on three critical aspects of a federated learning system, including client end, server end, and communication techniques. In each category, we summarize the existing federated learning methods according to specific research problems in medical image analysis and also provide insights into the motivations of different approaches. In addition, we provide a review of existing benchmark medical imaging datasets and software platforms for current federated learning research. We also conduct an experimental study to empirically evaluate typical federated learning methods for medical image analysis. This survey can help to better understand the current research status, challenges, and potential research opportunities in this promising research field. • Summarize existing methods from a system perspective. • Introduce different methods in a "question–answer" paradigm. • Introduce software platforms and benchmark datasets. • Conduct an experimental study. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Rigid pairwise 3D point cloud registration: A survey.
- Author
-
Lyu, Mengjin, Yang, Jie, Qi, Zhiquan, Xu, Ruijie, and Liu, Jiabin
- Subjects
- *
POINT cloud , *RECORDING & registration , *RESEARCH personnel , *BINOCULAR vision - Abstract
Over the past years, 3D point cloud registration has attracted unprecedented attention. Researchers develop various approaches to tackle the challenging task, such as optimization-based and deep learning-based methods. To systematically sort out the relevant literature and follow the state-of-the-art solutions, this paper conducts a thorough survey. We propose a novel taxonomy dubbed Intermediates Based Taxon (IBTaxon) which effectively categorizes multifarious registration approaches by the introduced intermediate variables or the leveraged intermediate modules. We further delve into each of the categories and present a comprehensive technique review with a focus on the distinct insight behind each of the methods. Besides, the relevant datasets and evaluation metrics are also combed and reorganized. We conclude our paper by discussing the possible open research problems and presenting our visions for future research in the field of 3D point cloud registration. • A novel taxonomy dubbed IBTaxon is proposed which categorizes registration methods as one-stage and two-stage approaches. • Following the IBTaxon, alternate and sequential optimization are induced as two strategies for two-stage approaches to achieve alignment. • Several widely used datasets and metrics for the standard evaluation and comparison of various registration methods are combed. • Experimental performances of several representative methods have been arranged and analyzed to provide a reference. • We make the conclusion that the balance between accuracy, speed, and robustness is more considerable rather than aspiring to a single indicator. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Multi-task hierarchical convolutional network for visual-semantic cross-modal retrieval.
- Author
-
Ji, Zhong, Lin, Zhigang, Wang, Haoran, Pang, Yanwei, and Li, Xuelong
- Subjects
- *
CONVOLUTIONAL neural networks , *MODAL logic , *SEMANTICS - Abstract
Bridging visual and textual representations plays a central role in delving into multimedia data understanding. The main challenge arises from that images and texts exist in heterogeneous spaces, leading to the difficulty to preserve the semantic consistency between both modalities. To narrow the modality gap, most recent methods resort to extra object detectors or parsers to obtain the hierarchical representations. In this work, we address this problem by introducing our Multi-Task Hierarchical Convolutional Neural Network (MT-HCN). It is characterized by mining the hierarchical semantic information without the aid of any extra supervisions. Firstly, from the perspective of representing architecture, we leverage the intrinsic hierarchical structure of Convolutional Neural Networks (CNNs) to decompose the representations of both modalities into two semantically complementary levels, i.e. , exterior representations and concept representations. The former focuses on discovering the fine-grained low-level associations between both modalities, meanwhile the latter underlines capturing more high-level abstract semantics. Specifically, we present a Self-Supervised Clustering (SSC) loss to preserve more fine-grained semantic clues in exterior representations. It is constituted on the basis of viewing multiple image/text pairs with similar exterior as a category. In addition, a novel harmonious bidirectional triplet ranking (HBTR) loss is proposed, which mitigate the adverse effects brought about by the biased and noisy negative samples. Besides hardest negatives, it also imposes the constraints on the distance between the positive pairs and the centroid of negative pairs. Extensive experimental results on two popular cross-modal retrieval benchmarks demonstrate our proposed MT-HCN can achieve the competitive results compared with the state-of-the-art methods. • This paper proposes a novel Multi-Task Hierarchical Convolutional Network (MT-HCN) for visual-semantic cross-modal retrieval, which is characterized by adopting classification task to improve the hierarchical multi-modal representation learning. • This paper proposes a novel Self-Supervision Clustering (SSC) loss to learn the exterior representations that fully exploits low-level fine-grained correlation for associating images and texts. • This paper presents an effective bidirectional ranking loss, namely Harmonious Bidirectional Ranking (HBR) for cross-modal correlation preserving. It not only efficiently assists us to seek out more representative hard negative samples, but also leverages the category center of negatives to enhance the robustness of cross-modal representations. • Extensive experiments on two benchmark datasets validate the superiority of our proposed model in comparison to the state-of-the-art approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Survey of spectral clustering based on graph theory.
- Author
-
Ding, Ling, Li, Chao, Jin, Di, and Ding, Shifei
- Subjects
- *
GRAPH theory , *LAPLACIAN matrices , *TIME complexity , *CUTTING stock problem , *EIGENVECTORS - Abstract
• This paper introduces the basic concept of graph theory, reviews the properties of Laplacian matrix and the traditional graph cuts method. Starting from four aspects in the realization process of spectral clustering (construction of similarity matrix, establishment of Laplacian matrix, selection of eigenvectors, and determination of the number of clusters), we have summarized in detail some representative algorithms in recent years. • Some successful applications of spectral clustering are summarized. In each aspect, the shortcomings of spectral clustering and some representative improved algorithms are emphatically analyzed. • This paper comprehensively analyzes some research on spectral clustering that has not yet been in-depth, and gives prospects on some valuable research directions. Spectral clustering converts the data clustering problem to the graph cut problem. It is based on graph theory. Due to the reliable theoretical basis and good clustering performance, spectral clustering has been successfully applied in many fields. Although spectral clustering has many advantages, it faces the challenges of high time and space complexity when dealing with large scale complex data. Firstly, this paper introduces the basic concept of graph theory, reviews the properties of Laplacian matrix and the traditional graph cuts method. Then, it focuses on four aspects of the realization process of spectral clustering, including the construction of similarity matrix, the establishment of Laplacian matrix, the selection of eigenvectors and the determination of the number of clusters. In addition, some successful applications of spectral clustering are summarized. In each aspect, the shortcomings of spectral clustering and some representative improved algorithms are emphatically analyzed. Finally, the paper comprehensively analyzes some research on spectral clustering that has not yet been in-depth, and gives prospects on some valuable research directions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Scalable and accurate subsequence transform for time series classification
- Author
-
Mbouopda, Michael Franklin and Mephu Nguifo, Engelbert
- Published
- 2024
- Full Text
- View/download PDF
34. GITGAN: Generative inter-subject transfer for EEG motor imagery analysis
- Author
-
Yin, Kang, Lim, Elissa Yanting, and Lee, Seong-Whan
- Published
- 2024
- Full Text
- View/download PDF
35. Tensorial bipartite graph clustering based on logarithmic coupled penalty.
- Author
-
Liu, Chang, Zhang, Hongbing, Fan, Hongtao, and Li, Yajing
- Subjects
- *
TIME complexity , *CAUCHY sequences , *BIPARTITE graphs , *ALGORITHMS - Abstract
The graph-based multi-view clustering method has gained considerable attention in recent years. However, due to its large time complexity, it is limited to handling small-scale clustering datasets. Moreover, most existing models only consider the similarity within views and do not leverage the correlation between views and use the tensor nuclear norm (TNN) as a convex approximation to the tensor rank function. The TNN treats each singular value equally, leading to suboptimal results. To address this issue, this paper proposes a tensorial multi-view clustering model based on bipartite graphs. This paper first introduces a new non-convex logarithmic coupled penalty (LCP) function that treats different singular values differently and preserves the useful structural information required. Additionally, a tensorial bipartite graph clustering model based on logarithmic coupled penalty (LCP-TBGC) is proposed along with a corresponding solution algorithm. The paper also presents a theoretical proof that the obtained resulting sequence converges to the Karush–Kuhn–Tucker (KKT) point. Finally, to validate the effectiveness and superiority of the proposed model, experiments were conducted on eight datasets. • This paper for the first time proposes a new logarithmic coupled penalty function designed to better explore the low-rank nature of tensors. • The improved algorithm is used to solve this model, with theoretical proof showing that it obtains a Cauchy sequence that converges to the KKT point. • Experimental results across eight datasets demonstrate the efficacy of proposed method, which has outperformed other clustering strategies of the same type. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Precise facial landmark detection by Dynamic Semantic Aggregation Transformer.
- Author
-
Wan, Jun, Liu, He, Wu, Yujia, Lai, Zhihui, Min, Wenwen, and Liu, Jun
- Subjects
- *
ARTIFICIAL neural networks , *AMBIGUITY - Abstract
At present, deep neural network methods have played a dominant role in face alignment field. However, they generally use predefined network structures to predict landmarks, which tends to learn general features and leads to mediocre performance, e.g., they perform well on neutral samples but struggle with faces exhibiting large poses or occlusions. Moreover, they cannot effectively deal with semantic gaps and ambiguities among features at different scales, which may hinder them from learning efficient features. To address the above issues, in this paper, we propose a Dynamic Semantic-Aggregation Transformer (DSAT) for more discriminative and representative feature (i.e., specialized feature) learning. Specifically, a Dynamic Semantic-Aware (DSA) model is first proposed to partition samples into subsets and activate the specific pathways for them by estimating the semantic correlations of feature channels, making it possible to learn specialized features from each subset. Then, a novel Dynamic Semantic Specialization (DSS) model is designed to mine the homogeneous information from features at different scales for eliminating the semantic gap and ambiguities and enhancing the representation ability. Finally, by integrating the DSA model and DSS model into our proposed DSAT in both dynamic architecture and dynamic parameter manners, more specialized features can be learned for achieving more precise face alignment. It is interesting to show that harder samples can be handled by activating more feature channels. Extensive experiments on popular face alignment datasets demonstrate that our proposed DSAT outperforms state-of-the-art models in the literature. Our code is available at https://github.com/GERMINO-LiuHe/DSAT. • This paper aims to learn specialized features to improve face alignment accuracy. • A novel DSA model is designed to activate specific pathways for each sample. • A DSS model is proposed to alleviate semantic ambiguity of multi-scale features. • Our proposed DSAT outperforms SOTA face alignment models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Hierarchical mixture of discriminative Generalized Dirichlet classifiers.
- Author
-
Togban, Elvis and Ziou, Djemel
- Subjects
- *
COLOR space , *MIXTURES , *CLASSIFICATION , *SPAM email - Abstract
This paper presents a discriminative classifier for compositional data. This classifier is based on the posterior distribution of the Generalized Dirichlet which is the discriminative counterpart of Generalized Dirichlet mixture model. Moreover, following the mixture of experts paradigm, we proposed a hierarchical mixture of this classifier. In order to learn the models parameters, we use a variational approximation by deriving an upper-bound for the Generalized Dirichlet mixture. To the best of our knownledge, this is the first time this bound is proposed in the literature. Experimental results are presented for spam detection and color space identification. • This paper addresses the challenge of compositional data classification. • A discriminative classifier based on the Generalized Dirichlet (GD) distribution is proposed. • A meta-classifier, established on the Hierarchical mixture of experts paradigm, was built. • An upper-bound for the mixture of GD was proposed, allowing a variational approximation. • The performance of the models was assessed through spam detection and color space identification. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Global contrast-masked autoencoders are powerful pathological representation learners.
- Author
-
Quan, Hao, Li, Xingyu, Chen, Weixing, Bai, Qun, Zou, Mingchen, Yang, Ruijie, Zheng, Tingting, Qi, Ruiqun, Gao, Xinghua, and Cui, Xiaoyu
- Subjects
- *
ARTIFICIAL intelligence , *IMAGE reconstruction , *HEMATOXYLIN & eosin staining , *IMAGE representation , *COMPUTER-assisted image analysis (Medicine) , *DEEP learning - Abstract
• We have designed two self-supervised pretext tasks: masking image reconstruction and contrastive learning, which can train the encoder to have the ability to represent local-global features. • We discuss the mask ratio, which is suitable for pathology-specific training methodologies based on the masked image modeling paradigm. • We selected three pathological image datasets and proved the effectiveness of GCMAE algorithm through extensive experiments. • An automatic pathology image diagnosis process was designed based on the GCMAE to improve the credibility of the model in clinical applications. Using digital pathology slide scanning technology, artificial intelligence algorithms, particularly deep learning, have achieved significant results in the field of computational pathology. Compared to other medical images, pathology images are more difficult to annotate, and thus, there is an extreme lack of available datasets for conducting supervised learning to train robust deep learning models. In this paper, we introduce a self-supervised learning (SSL) model, the Global Contrast-masked Autoencoder (GCMAE), designed to train encoders to capture both local and global features of pathological images and significantly enhance the performance of transfer learning across datasets. Our study demonstrates the capability of the GCMAE to learn transferable representations through extensive experiments on three distinct disease-specific hematoxylin and eosin (H&E)-stained pathology datasets: Camelyon16, NCT-CRC, and BreakHis. Moreover, we propose an effective automated pathology diagnosis process based on the GCMAE for clinical applications. The source code of this paper is publicly available at https://github.com/StarUniversus/gcmae. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. Dynamic weighted knowledge distillation for brain tumor segmentation.
- Author
-
An, Dianlong, Liu, Panpan, Feng, Yan, Ding, Pengju, Zhou, Weifeng, and Yu, Bin
- Subjects
- *
BRAIN tumors , *NUMBER concept , *CANCER diagnosis , *IMAGE analysis , *SOURCE code - Abstract
• A novel knowledge distillation framework to compress brain tumor segmentation models. • Proposing the Dynamic Weighted Knowledge Distillation (DWKD) algorithm. • Introducing the Regularized Cross-Entropy (RCE) loss function to enhance the model's robustness. • We have confirmed that DWKD demonstrates superior capability in enhancing model interpretability compared to SKD. • Compared to SKD, our proposed method outperforms on BraTS 2019, BraTS 2020, and BraTS 2021 datasets. Automatic 3D MRI brain tumor segmentation holds a crucial position in the field of medical image analysis, contributing significantly to the clinical diagnosis and treatment of brain tumors. However, traditional 3D brain tumor segmentation methods often entail extensive parameters and computational demands, posing substantial challenges in model training and deployment. To overcome these challenges, this paper introduces a brain tumor segmentation framework based on knowledge distillation. This framework includes training a lightweight network by extracting knowledge from a well-established brain tumor segmentation network. Firstly, this framework replaces the conventional static knowledge distillation (SKD) with the proposed dynamic weighted knowledge distillation (DWKD). DWKD dynamically adjusts the distillation loss weights for each pixel based on the learning state of the student network. Secondly, to enhance the student network's generalization capability, this paper customizes a loss function for DWKD, known as regularized cross-entropy (RCE). RCE introduces controlled noise into the model, enhancing its robustness and diminishing the risk of overfitting. This controlled injection of noise aids in fortifying the model's robustness. Lastly, Empirical validation of the proposed methodology is conducted using two distinct backbone networks, namely Attention U-Net and Residual U-Net. Rigorous experimentation is executed across the BraTS 2019, BraTS 2020, and BraTS 2021 datasets. Experimental results demonstrate that DWKD exhibits significant advantages over SKD in enhancing the segmentation performance of the student network. Furthermore, when dealing with limited training data, the RCE method can further improve the student network's segmentation performance. Additionally, this paper quantitatively analyzes the number of concept detectors identified in network dissection. It assesses the impact of DWKD on model interpretability and finds that compared to SKD, DWKD can more effectively enhance model interpretability. The source code is available at https://github.com/YuBinLab-QUST/DWKD/. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Contrastive cross-modal clustering with twin network.
- Author
-
Mao, Yiqiao, Yan, Xiaoqiang, Hu, Shizhe, and Ye, Yangdong
- Subjects
- *
DESIGN - Abstract
Cross-modal clustering (CMC) methods explore the correlation information between multiple modalities to improve clustering performance. However, the obvious differences between heterogeneous modalities make it difficult to obtain the correlation information directly. In this paper, we propose a novel Contrastive Cross-modal Clustering with Twin Network (3CTnet) for CMC, which contrasts the differences of multiple modalities to fully mine the correlation information. The 3CTnet contains two modal-special encoders and an attention-based correlation propagate module (CPM). First, the modal-special encoders are trained by pseudo-labels to learn the clustering structure and feature of single modality. Then we contrast the clustering structures and features of different modalities to explore the inter-cluster and inter-feature correlation information simultaneously. Finally, the CPM is designed to propagate the learned correlation information among modal-special encoders to further optimize the learning of features and clustering structures. The experiments show that 3CTnet outperforms the state-of-the-art CMC methods on six large datasets. • In this paper, a novel 3CTnet method is proposed for cross-modal clustering. • We contrast the differences of multiple modalities to mine correlation information. • A correlation propagate module is designed to propagate the correlation information. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Advancing Supervised Learning with the Wave Loss Function: A Robust and Smooth Approach.
- Author
-
Akhtar, Mushir, Tanveer, M., and Arshad, Mohd.
- Subjects
- *
SUPERVISED learning , *MACHINE learning , *SUPPORT vector machines , *WAVE functions , *ALZHEIMER'S disease - Abstract
Loss function plays a vital role in supervised learning frameworks. The selection of the appropriate loss function holds the potential to have a substantial impact on the proficiency attained by the acquired model. The training of supervised learning algorithms inherently adheres to predetermined loss functions during the optimization process. In this paper, we present a novel contribution to the realm of supervised machine learning: an asymmetric loss function named wave loss. It exhibits robustness against outliers, insensitivity to noise, boundedness, and a crucial smoothness property. Theoretically, we establish that the proposed wave loss function manifests the essential characteristic of being classification-calibrated. Leveraging this breakthrough, we incorporate the proposed wave loss function into the least squares setting of support vector machines (SVM) and twin support vector machines (TSVM), resulting in two robust and smooth models termed as Wave-SVM and Wave-TSVM, respectively. To address the optimization problem inherent in Wave-SVM, we utilize the adaptive moment estimation (Adam) algorithm, which confers multiple benefits, including the incorporation of adaptive learning rates, efficient memory utilization, and faster convergence during training. It is noteworthy that this paper marks the first instance of Adam's application to solve an SVM model. Further, we devise an iterative algorithm to solve the optimization problems of Wave-TSVM. To empirically showcase the effectiveness of the proposed Wave-SVM and Wave-TSVM, we evaluate them on benchmark UCI and KEEL datasets (with and without feature noise) from diverse domains. Moreover, to exemplify the applicability of Wave-SVM in the biomedical domain, we evaluate it on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. The experimental outcomes unequivocally reveal the prowess of Wave-SVM and Wave-TSVM in achieving superior prediction accuracy against the baseline models. The source codes of the proposed models are publicly available at https://github.com/mtanveer1/Wave-SVM. • A new asymmetric, bounded, and smooth loss function termed wave loss is proposed. • Theoretically, we analyzed the classification-calibrated characteristic of the wave loss function. • Two new robust and smooth models, termed Wave-SVM and Wave-TSVM, are proposed. • The Adam algorithm is used to solve the optimization problem of the Wave-SVM. • The optimization problem of the Wave-TSVM is solved by an efficient iterative method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Robust feature selection via central point link information and sparse latent representation.
- Author
-
Kong, Jiarui, Shang, Ronghua, Zhang, Weitong, Wang, Chao, and Xu, Songhua
- Subjects
- *
FEATURE selection , *LAPLACIAN matrices , *SPARSE matrices , *DATA mining - Abstract
• This paper proposes a novel unsupervised feature selection called CPSLR. • CPSLR uses a specific formula to obtain the central point matrix. Construct a link graph through the central matrix and the Laplacian matrix to retain the similarity between the data. • The link graph and the data graph form a dual graph structure, which can not only preserve more complete data information but also holds on to the manifold structure of the data. • Feature selection is conducted in the latent representation space, and interconnection information among data is mined by using latent representation learning to preserve the connections among data itself. • CPSLR applies l 2,1/2-norm constraint on the feature transformation matrix to select robust and low-redundancy features. Before conducting unsupervised feature selection, it is usually assumed that these data are independent of each other. On the contrary, real data will influence each other. Therefore, traditional feature selection methods may lose information related to each other between data. This can lead to inaccurately generated pseudo-label information and may result in poor feature selection results. To find solutions to this issue, this paper proposes robust feature selection via central point link information and sparse latent representation (CPSLR). Firstly, structure a link graph by calculating the center matrix to store the distance information from the sample to the center point. If two samples have similar distances to the center point, it can be determined that they belong to the same class. Therefore, the similarity between samples is preserved, and more accurate pseudo-label information is obtained. Secondly, CPSLR uses data graph and link graph to form a dual graph structure. It can not only retain the link information between samples but also retain the manifold structures of the samples. Then, CPSLR saves the interconnection contents between samples by sparse latent representation. That is, the constraint l 2,1 -norm is exerted on the expression of latent representation, and sparse non-redundant interconnection information is preserved. And by combining central point link information with sparse latent representation makes the interconnections between data reserved more comprehensive. That is to say, the pseudo-labels obtained are more like the real labels of the classes. Finally, CPSLR constrains the feature transformation matrix by l 2,1/2 -norm constraint so as to select robust and sparse features. CPSLR uses l 2,1/2 -norm constraint to assure that the feature transformation matrix is sparse, selecting more discriminative features, thereby obtaining the feature selection that can improve its efficiency. The experiments demonstrate that the clustering result of CPSLR outperform six classical or latest compared algorithms on eight datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Certainty weighted voting-based noise correction for crowdsourcing.
- Author
-
Li, Huiru, Jiang, Liangxiao, and Li, Chaoqun
- Subjects
- *
CROWDSOURCING , *INDIVIDUALS' preferences , *NOISE , *CERTAINTY , *POPULAR music genres - Abstract
In crowdsourcing scenarios, we can obtain each instance's multiple noisy label set from different workers and then use a ground truth inference algorithm to infer its integrated label. Despite the effectiveness of ground truth inference algorithms, there is still a certain level of noise in integrated labels. To reduce the impact of noise, many noise correction algorithms have been proposed in recent years. To the best of our knowledge, almost all these algorithms assume that workers have the same labeling certainty on different classes and instances. However, it is rarely true in reality due to the differences in workers' individual preferences and cognitive abilities. In this paper, we argue that the labeling certainty of a worker should be class-dependent and instance-dependent. Based on this premise, we propose a certainty weighted voting-based noise correction (CWVNC) algorithm. At first, we use the consistency between worker-labeled labels and integrated labels on different classes to estimate the class-dependent certainty. Then, we train a probability-based classifier on the instances labeled by each worker separately and use it to estimate the instance-dependent certainty. Finally, we correct the integrated label of each instance by weighted voting based on class-dependent certainty and instance-dependent certainty. When the proposed algorithm CWVNC is examined, the average noise ratio of CWVNC on 34 simulated datasets is equal to 15.08%, and on two real-world datasets "Income" and "Music_genre" the noise ratio is equal to 25.77% and 26.94%, respectively. The results show that CWVNC significantly outperforms all other state-of-the-art noise correction algorithms used for comparison. • Crowdsourcing provides an effective way to collect labels from crowd workers. • Noise correction algorithms have been proposed to reduce the noise. • Existing algorithms assume that workers have the same labeling certainty. • This paper proposes a certainty weighted voting-based noise correction algorithm. • The extensive experiments validate the effectiveness of the proposed algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. HDR light field imaging of dynamic scenes: A learning-based method and a benchmark dataset.
- Author
-
Chen, Yeyao, Jiang, Gangyi, Yu, Mei, Jin, Chongchong, Xu, Haiyong, and Ho, Yo-Sung
- Subjects
- *
HIGH dynamic range imaging , *FEATURE extraction , *IMAGE sensors , *POTENTIAL well - Abstract
• A novel learning-based method is proposed for ghost-free high dynamic range (HDR) light field imaging. • A multi-scale architecture integrating deformable alignment module and angular embedding module is designed. • A new large-scale benchmark dataset is established to serve the HDR light field imaging task for dynamic scenes. • The proposed method achieves superior spatial quality and preserves accurate angular consistency. Light field (LF) imaging is an effective way to enable immersive applications. However, limited by the potential well capacity of the image sensor, the acquired LF images suffer from low dynamic range and are thus prone to under-exposure or over-exposure. High dynamic range (HDR) LF imaging is an efficacious avenue to improve the LF imaging's dynamic range. Unfortunately, for dynamic scenes, existing methods are inclined to produce ghosting artifacts and lose details in the saturated regions, while potentially damaging the parallax structure of generated HDR LF images. To address the above challenges, in this paper, we propose a new ghost-free HDR LF imaging method based on a deformable aggregation and angular embedding network. Specifically, considering the four-dimensional geometric structure of the LF image, a deformable alignment module is first designed to handle dynamic regions in the spatial domain, and then the aligned spatial features are fully fused through an aggregation operation. Subsequently, an angular embedding module is constructed to explore angular information to enhance the aggregated spatial features. Based on this, the above two modules are cascaded in a multi-scale manner to achieve multi-level feature extraction and enhance the feature representation ability. Finally, a decoder is leveraged to recover the ghost-free HDR LF image from the enhanced multi-scale features. For performance evaluation, this paper establishes a large-scale benchmark dataset with multi-exposure inputs and ground truth images. Extensive experimental results show that the proposed method generates visually pleasing HDR LF images while preserving accurate angular consistency. Moreover, the proposed method surpasses the state-of-the-art methods in both quantitative and qualitative comparisons. The code and dataset will be available at https://github.com/YeyaoChen/HDRLFI. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Fusion-competition framework of local topology and global texture for head pose estimation.
- Author
-
Ma, Dongsheng, Fu, Tianyu, Yang, Yifei, Cao, Kaibin, Fan, Jingfan, Xiao, Deqiang, Song, Hong, Gu, Ying, and Yang, Jian
- Subjects
- *
POINT cloud , *SUBSPACES (Mathematics) - Abstract
• The proposed method combined the heterogeneous data to fully utilizes the texture information of RGB image and the geometric information of point cloud. Compared with depth image, the point cloud has more powerful topology feature, which can be learned with texture feature for accurately and robustly head pose estimation. • The proposed framework is constructed to achieve the feature fusion in the texture-topology level and generate the feature competition among the local regions. This fusion-competition framework enhances the expression of the features with different categories in the different levels to decrease the estimation error and increase the stability. • This paper constructed an RGB-Depth dataset using HoloLens2 for training and testing in head pose estimation. This dataset has abundant head pose samples including 24 sessions with 12 K frames from 21 males and 1 female, and the ground truth of pose in each frame is labeled by an accurate tracking device tied on the head. RGB image and point cloud involve texture and geometric structure, which are widely used for head pose estimation. However, images lack of spatial information, and the quality of point cloud is easily affected by sensor noise. In this paper, a novel fusion-competition framework (FCF) is proposed to overcome the limitations of a single modality. The global texture information is extracted from image and the local topology information is extracted from point cloud to project heterogeneous data into a common feature subspace. The projected texture feature weighted by the channel attention mechanism is embedded into each local point cloud region with different topological features for fusion. The scoring mechanism creates competition among the regions involving local-global fused features to predict final pose with the highest score. According to the evaluation results on the public and our constructed datasets, the FCF improves the estimation accuracy and stability by an average of 13.6 % and 12.7 %, which is compared to nine state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Multi-scale hypergraph-based feature alignment network for cell localization.
- Author
-
Li, Bo, Zhang, Yong, Zhang, Chengyang, Piao, Xinglin, Hu, Yongli, and Yin, Baocai
- Subjects
- *
CELL imaging , *GRAPH algorithms , *IMAGE analysis , *CELL morphology , *MULTISCALE modeling , *BIOLOGICAL networks , *HYPERGRAPHS - Abstract
Cell localization in medical image analysis is a challenging task due to the significant variation in cell shape, size and color. Existing localization methods continue to tackle these challenges separately, frequently facing complications where these difficulties intersect and adversely impact model performance. In this paper, these challenges are first reframed as issues of feature misalignment between cell images and location maps, which are then collectively addressed. Specifically, we propose a feature alignment model based on a multi-scale hypergraph attention network. The model considers local regions in the feature map as nodes and utilizes a learnable similarity metric to construct hypergraphs at various scales. We then utilize a hypergraph convolutional network to aggregate the features associated with the nodes and achieve feature alignment between the cell images and location maps. Furthermore, we introduce a stepwise adaptive fusion module to fuse features at different levels effectively and adaptively. The comprehensive experimental results demonstrate the effectiveness of our proposed multi-scale hypergraph attention module in addressing the issue of feature misalignment, and our model achieves state-of-the-art performance across various cell localization datasets. • This paper innovatively addresses the challenges stemming from significant variations in cell shape, scale, and color by reframing them as a feature misalignment problem between cell images and location maps, thereby presenting a unified solution to these complexities. • We propose an innovative multi-scale hypergraph attention module that achieves feature alignment through the adaptive aggregation of features across various scale ranges. • The proposed model achieves state-of-the-art performance on multiple cell localization datasets and reveals great potential. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. PDRLRR: A novel low-rank representation with projection distance regularization via manifold optimization for clustering.
- Author
-
Chen, Haoran, Chen, Xu, Tao, Hongwei, Li, Zuhe, and Wang, Boyue
- Subjects
- *
MACHINE learning , *DATA reduction - Abstract
The low-rank representation (LRR) method has attracted widespread attention due to its excellent performance in pattern recognition and machine learning. LRR-based variants have been proposed to solve the three existing problems in LRR: (1) the projection matrix is permanently fixed when dimensionality reduction techniques are adopted; (2) LRR fails to capture the local geometric structure; and (3) the solution deviates from the real low-rank solution. To address these problems, this paper proposes a low-rank representation with projection distance regularization (PDRLRR) via manifold optimization for clustering. In detail, we first introduce a low-dimensional projection matrix and a projection distance regularization term to fit the projected data automatically and capture the local structure of the data, respectively. Consequently, the projection matrix and representation matrix are obtained jointly. Then, we obtain a more accurate low-rank solution by minimizing the Schatten- p norm instead of the nuclear norm. Next, the projection matrix is optimized through a generalized Stiefel manifold. Extensive experiments demonstrate that our proposed method outperforms the state-of-the-art methods. • This paper proposes a novel PDRLRR model that can simultaneously address the three common problems in LRR. • Data dimensionality reduction technology is integrated into LRR, reducing the data dimensions while learning representation matrices. • For extracting the complete information, the projection distance regularization term is introduced to capture the global and local structure of the data. • The Schatten- p norm instead of the nuclear norm is employed to solve the rank minimization problem of the representation matrix, which can more accurately approximate the real low-rank solution. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. SED: Searching Enhanced Decoder with switchable skip connection for semantic segmentation.
- Author
-
Zhang, Xian, Quan, Zhibin, Li, Qiang, Zhu, Dejun, and Yang, Wankou
- Subjects
- *
IMAGE segmentation , *OCCUPATIONAL retraining , *SPINE , *TEMPORAL lobe - Abstract
Neural architecture search (NAS) has shown excellent performance. However, existing semantic segmentation models rely heavily on pre-training on Image-Net or COCO and mainly focus on the designing of decoders. Directly training the encoder–decoder architecture search models from scratch to SOTA for semantic segmentation requires even thousands GPU days, which greatly limits the application of NAS. To address this issue, we propose a novel neural architecture Search framework for Enhanced Decoder (SED). Utilizing the pre-trained hand-designing backbone and the searching space composed of light-weight cells, SED searches for a decoder which can perform high-quality segmentation. Furthermore, we attach switchable skip connection operations to search space, expanding the diversity of possible network structure. The parameters of backbone and operations selected in searching phrase are copied to retraining process. As a result, searching, pruning and retraining can be done in just 1 day. The experimental results show that the SED proposed in this paper only needs 1/4 of the parameters and calculation in contrast to hand-designing decoder, and obtains higher segmentation accuracy on Cityscapes. Transferring the same decoder architecture to other datasets, such as: Pascal VOC 2012, Camvid, ADE20K proves the robustness of SED. • For the task of image semantic segmentation, we propose a gradient-based, pre-trainable neural network architecture search framework SED. In this paper we simultaneously considering decoder and skip connection search. Our method maximizes the advantages of NAS and pre- trained backbone. • SED can compress the retraining iterations to several thousands. The whole searching, pruning, retraining process can be compressed into 1 day. Furthermore, after searching on Cityscapes, the searched network architecture can achieve 80.2% mIoU. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Robust multi-scale weighting-based edge-smoothing filter for single image dehazing.
- Author
-
Yadav, Sumit Kr. and Sarawadekar, Kishor
- Subjects
- *
COST functions , *REGULARIZATION parameter , *SMOOTHING (Numerical analysis) , *QUANTITATIVE research , *HAZE - Abstract
The guided image filter (GIF) and weighted guided image filter (WGIF) are local linear model-based good edge-preserving filters. However, due to fixed regularization parameter, they suffer from halo artifacts (morphological artifacts) in the sharp regions. To overcome this issue, a robust multi-scale weighting-based edge-smoothing filter (RMWEF) for single image dehazing is proposed in this paper. It removes morphological artifacts and over-smoothness strongly and preserves edge information precisely in both flat and sharp regions. The proposed dehaze method has four-steps. First, initial transmission map and atmospheric map are estimated by using a novel dark channel prior (DCP) method. Then, the morphological artifacts of initial transmission map are reduced by using non-local haze line averaging (NL-HLA) method. In the third step, transmission map is refined by using the proposed RMWEF. Finally, the haze free image is restored. Theoretical and experimental analysis proves that the proposed algorithm produce effective dehaze results quicker than the existing methods. • Robust multi-scale weighting-based edge-smoothing filter (RMWEF) is proposed in this paper. • The value of the cost function (a x ′ , y ′ ) must vary in the range of '0' to '1' depending on the edge- aware smoothing parameter (γ x ′ , y ′ ). The mathematical formulation presented in this paper shows that the proposed filter maintains this relationship, as expected. • This article presents quantitative analysis for three different values of the regularization (ϵ) parameter viz. 0.01 2 , 0.1 2 , 1 2. Thus, it shows the trade-off between regularization parameter (ϵ) and morphological artifacts. • The proposed RMWEF removes morphological artifacts and over-smoothing effects strongly in the fine structures and preserves details in such areas very well by choosing large window radius ζ 1 =60. • The proposed method is tested on 6,618 images from different datasets and the results are compared with 9 existing methods. The experimental results show that it is independent of the nature of images. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Deep federated learning hybrid optimization model based on encrypted aligned data.
- Author
-
Zhao, Zhongnan, Liang, Xiaoliang, Huang, Hai, and Wang, Kun
- Subjects
- *
DEEP learning , *FEDERATED learning , *BLENDED learning , *GAUSSIAN mixture models , *SEARCH algorithms , *FEATURE extraction , *RECEIVER operating characteristic curves - Abstract
• Improving the quality of Federal Learning encrypted alignment data. • Use Gaussian mixture clustering to cluster samples and set a threshold to filter samples. • Use the encrypted sample attribute searching algorithm to fill in the missing value of the sample. • Design the combination model of variation auto-encoder Gaussian hybrid clustering and federated learning. Federated learning can achieve multi-party data-collaborative applications while safeguarding personal privacy. However, the process often leads to a decline in the quality of sample data due to a substantial amount of missing encrypted aligned data, and there is a lack of research on how to improve the model learning effect by increasing the number of samples of encrypted aligned data in federated learning. Therefore, this paper integrates the functional characteristics of deep learning models and proposes a Variational AutoEncoder Gaussian Mixture Model Clustering Vertical Federated Learning Model (VAEGMMC-VFL), which leverages the feature extraction capability of the autoencoder and the clustering and pattern discovery capabilities of Gaussian mixture clustering on diverse datasets to further explore a large number of potentially usable samples. Firstly, the Variational AutoEncoder is used to achieve dimensionality reduction and sample feature reconstruction of high-dimensional data samples. Subsequently, Gaussian mixture clustering is further employed to partition the dataset into multiple potential Gaussian-distributed clusters and filter the sample data using thresholding. Additionally, the paper introduces a labeled sample attribute value finding algorithm to fill in attribute values for encrypted unaligned samples that meet the requirements, allowing for the full recovery of encrypted unaligned data. In the experimental section, the paper selects four sets of datasets from different industries and compares the proposed method with three federated learning clustering methods in terms of clustering loss, reconstruction loss, and other metrics. Tests on precision, accuracy, recall, ROC curve, and F1-score indicate that the proposed method outperforms similar approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.