Descriptor: "training data" / Publication Type: Magazines - Searchworks@Jio Institute Digital Library Search Results

1. Stochastic Mirror Descent on Overparameterized Nonlinear Models.

Author: Azizan, Navid, Lale, Sahin, and Hassibi, Babak
Subjects: *MACHINE learning, *MIRRORS, *LEARNING problems, *DEEP learning
Abstract: Most modern learning problems are highly overparameterized, i.e., have many more model parameters than the number of training data points. As a result, the training loss may have infinitely many global minima (parameter vectors that perfectly “interpolate” the training data). It is therefore imperative to understand which interpolating solutions we converge to, how they depend on the initialization and learning algorithm, and whether they yield different test errors. In this article, we study these questions for the family of stochastic mirror descent (SMD) algorithms, of which stochastic gradient descent (SGD) is a special case. Recently, it has been shown that for overparameterized linear models, SMD converges to the closest global minimum to the initialization point, where closeness is in terms of the Bregman divergence corresponding to the potential function of the mirror descent. With appropriate initialization, this yields convergence to the minimum-potential interpolating solution, a phenomenon referred to as implicit regularization. On the theory side, we show that for sufficiently-overparameterized nonlinear models, SMD with a (small enough) fixed step size converges to a global minimum that is “very close” (in Bregman divergence) to the minimum-potential interpolating solution, thus attaining approximate implicit regularization. On the empirical side, our experiments on the MNIST and CIFAR-10 datasets consistently confirm that the above phenomenon occurs in practical scenarios. They further indicate a clear difference in the generalization performances of different SMD algorithms: experiments on the CIFAR-10 dataset with different regularizers, $\ell _{1}$ to encourage sparsity, $\ell _{2}$ (SGD) to encourage small Euclidean norm, and $\ell _{\infty }$ to discourage large components, surprisingly show that the $\ell _{\infty }$ norm consistently yields better generalization performance than SGD, which in turn generalizes better than the $\ell _{1}$ norm. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

2. Network Pruning Using Adaptive Exemplar Filters.

Author: Lin, Mingbao, Ji, Rongrong, Li, Shaojie, Wang, Yan, Wu, Yongjian, Huang, Feiyue, and Ye, Qixiang
Subjects: *ADAPTIVE filters, *MESSAGE passing (Computer science), *COMMUNITIES, *COMPUTER architecture
Abstract: Popular network pruning algorithms reduce redundant information by optimizing hand-crafted models, and may cause suboptimal performance and long time in selecting filters. We innovatively introduce adaptive exemplar filters to simplify the algorithm design, resulting in an automatic and efficient pruning approach called EPruner. Inspired by the face recognition community, we use a message-passing algorithm Affinity Propagation on the weight matrices to obtain an adaptive number of exemplars, which then act as the preserved filters. EPruner breaks the dependence on the training data in determining the “important” filters and allows the CPU implementation in seconds, an order of magnitude faster than GPU-based SOTAs. Moreover, we show that the weights of exemplars provide a better initialization for the fine-tuning. On VGGNet-16, EPruner achieves a 76.34%-FLOPs reduction by removing 88.80% parameters, with 0.06% accuracy improvement on CIFAR-10. In ResNet-152, EPruner achieves a 65.12%-FLOPs reduction by removing 64.18% parameters, with only 0.71% top-5 accuracy loss on ILSVRC-2012. Our code is available at https://github.com/lmbxmu/EPruner. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

3. Consistent Meta-Regularization for Better Meta-Knowledge in Few-Shot Learning.

Author: Tian, Pinzhuo, Li, Wenbin, and Gao, Yang
Subjects: *MACHINE learning, *DEEP learning, *TECHNOLOGICAL innovations
Abstract: Recently, meta-learning provides a powerful paradigm to deal with the few-shot learning problem. However, existing meta-learning approaches ignore the prior fact that good meta-knowledge should alleviate the data inconsistency between training and test data, caused by the extremely limited data, in each few-shot learning task. Moreover, legitimately utilizing the prior understanding of meta-knowledge can lead us to design an efficient method to improve the meta-learning model. Under this circumstance, we consider the data inconsistency from the distribution perspective, making it convenient to bring in the prior fact, and propose a new consistent meta-regularization (Con-MetaReg) to help the meta-learning model learn how to reduce the data-distribution discrepancy between the training and test data. In this way, the ability of meta-knowledge on keeping the training and test data consistent is enhanced, and the performance of the meta-learning model can be further improved. The extensive analyses and experiments demonstrate that our method can indeed improve the performances of different meta-learning models in few-shot regression, classification, and fine-grained classification. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

4. Elastic Net Nonparallel Hyperplane Support Vector Machine and Its Geometrical Rationality.

Author: Qi, Kai and Yang, Hu
Subjects: *QUADRATIC programming, *PETRI nets, *SUPPORT vector machines, *HYPERPLANES
Abstract: Twin support vector machine (TWSVM), which constructs two nonparallel classifying hyperplanes, is widely applied to various fields. However, TWSVM solves two quadratic programming problems (QPPs) separately such that the final classifiers lack consistency and enough prediction accuracy. Moreover, by reason of only considering the 1-norm penalty for slack variables, TWSVM is not well defined in the geometrical view. In this article, we propose a novel elastic net nonparallel hyperplane support vector machine (ENNHSVM), which adopts elastic net penalty for slack variables and constructs two nonparallel separating hyperplanes simultaneously. We further discuss the properties of ENNHSVM theoretically and derive the violation tolerance upper bound to better demonstrate the relative violations of training samples in the same class. In particular, we design a safe screening rule for ENNHSVM to speed up the calculations. We finally compare the performance of ENNHSVM on both synthetic datasets and benchmark datasets with the Lagrangian SVM, the twin parametric-margin SVM, the elastic net SVM, the TWSVM, and the nonparallel hyperplane SVM. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

5. A Sensitivity-Based Data Augmentation Framework for Model Predictive Control Policy Approximation.

Author: Krishnamoorthy, Dinesh
Subjects: *DATA augmentation, *SUPERVISED learning, *PREDICTION models, *APPROXIMATION algorithms, *DEEP learning
Abstract: Approximating model predictive control (MPC) policy using expert-based supervised learning techniques requires labeled training datasets sampled from the MPC policy. This is typically obtained by sampling the feasible state space and evaluating the control law by solving the numerical optimization problem offline for each sample. Although the resulting approximate policy can be cheaply evaluated online, generating large training samples to learn the MPC policy can be time-consuming and prohibitively expensive. This is one of the fundamental bottlenecks that limit the design and implementation of MPC policy approximation. This technical article aims to address this challenge, and proposes a novel sensitivity-based data augmentation scheme for direct policy approximation. The proposed approach is based on exploiting the parametric sensitivities to cheaply generate additional training samples in the neighborhood of the existing samples. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

6. Class-Imbalanced Deep Learning via a Class-Balanced Ensemble.

Author: Chen, Zhi, Duan, Jiang, Kang, Li, and Qiu, Guoping
Subjects: *DEEP learning, *CONVOLUTIONAL neural networks
Abstract: Class imbalance is a prevalent phenomenon in various real-world applications and it presents significant challenges to model learning, including deep learning. In this work, we embed ensemble learning into the deep convolutional neural networks (CNNs) to tackle the class-imbalanced learning problem. An ensemble of auxiliary classifiers branching out from various hidden layers of a CNN is trained together with the CNN in an end-to-end manner. To that end, we designed a new loss function that can rectify the bias toward the majority classes by forcing the CNN’s hidden layers and its associated auxiliary classifiers to focus on the samples that have been misclassified by previous layers, thus enabling subsequent layers to develop diverse behavior and fix the errors of previous layers in a batch-wise manner. A unique feature of the new method is that the ensemble of auxiliary classifiers can work together with the main CNN to form a more powerful combined classifier, or can be removed after finished training the CNN and thus only acting the role of assisting class imbalance learning of the CNN to enhance the neural network’s capability in dealing with class-imbalanced data. Comprehensive experiments are conducted on four benchmark data sets of increasing complexity (CIFAR-10, CIFAR-100, iNaturalist, and CelebA) and the results demonstrate significant performance improvements over the state-of-the-art deep imbalance learning methods. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

7. Whitening-Net: A Generalized Network to Diagnose the Faults Among Different Machines and Conditions.

Author: Li, Jie, Wang, Yu, Zi, Yanyang, and Zhang, Zhijie
Subjects: *DATA mining, *MACHINERY, *FAULT diagnosis, *FEATURE extraction, *BEARINGS (Machinery)
Abstract: Intelligent bearing diagnostic methods are developing rapidly, but they are difficult to implement due to the lack of real industrial data. A feasible way to deal with this problem is to train a network through laboratory data to mine the causality of bearing faults. This means that the constructed network can handle domain deviations caused by the change of machines, working conditions, noise, and so on which is, however, not a simple task. In response to this problem, a new domain generalization framework—Whitening-Net—was proposed in this article. This framework first defined the homologous compound domain signal as the data basis. Subsequently, the causal loss was proposed to impose regularization constraints on the network, which enhances the network’s ability to mine causality. To avoid domain-specific information from interfering with causal mining, a whitening structure was proposed to whiten the domain, prompting the network to pay more attention to the causality of the signal rather than the domain noise. The results of diagnosis and interpretation proved the ability of Whitening-Net in mining causal mechanisms, which shows that the proposed network can generalize to different machines, even if the tested working conditions and bearing types are completely different from the training domains. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

8. Adaptive Data Structure Regularized Multiclass Discriminative Feature Selection.

Author: Fan, Mingyu, Zhang, Xiaoqin, Hu, Jie, Gu, Nannan, and Tao, Dacheng
Subjects: *FEATURE selection, *SMART structures, *DATA distribution, *SUPERVISED learning, *DATA structures
Abstract: Feature selection (FS), which aims to identify the most informative subset of input features, is an important approach to dimensionality reduction. In this article, a novel FS framework is proposed for both unsupervised and semisupervised scenarios. To make efficient use of data distribution to evaluate features, the framework combines data structure learning (as referred to as data distribution modeling) and FS in a unified formulation such that the data structure learning improves the results of FS and vice versa. Moreover, two types of data structures, namely the soft and hard data structures, are learned and used in the proposed FS framework. The soft data structure refers to the pairwise weights among data samples, and the hard data structure refers to the estimated labels obtained from clustering or semisupervised classification. Both of these data structures are naturally formulated as regularization terms in the proposed framework. In the optimization process, the soft and hard data structures are learned from data represented by the selected features, and then, the most informative features are reselected by referring to the data structures. In this way, the framework uses the interactions between data structure learning and FS to select the most discriminative and informative features. Following the proposed framework, a new semisupervised FS (SSFS) method is derived and studied in depth. Experiments on real-world data sets demonstrate the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

9. A Segment-Based Drift Adaptation Method for Data Streams.

Author: Song, Yiliao, Lu, Jie, Liu, Anjin, Lu, Haiyan, and Zhang, Guangquan
Subjects: *DISTRIBUTION (Probability theory), *ONLINE education, *TASK analysis, *PHYSIOLOGICAL adaptation
Abstract: In concept drift adaptation, we aim to design a blind or an informed strategy to update our best predictor for future data at each time point. However, existing informed drift adaptation methods need to wait for an entire batch of data to detect drift and then update the predictor (if drift is detected), which causes adaptation delay. To overcome the adaptation delay, we propose a sequentially updated statistic, called drift-gradient to quantify the increase of distributional discrepancy when every new instance arrives. Based on drift-gradient, a segment-based drift adaptation (SEGA) method is developed to online update our best predictor. Drift-gradient is defined on a segment in the training set. It can precisely quantify the increase of distributional discrepancy between the old segment and the newest segment when only one new instance is available at each time point. A lower value of drift-gradient on the old segment represents that the distribution of the new instance is closer to the distribution of the old segment. Based on the drift-gradient, SEGA retrains our best predictors with the segments that have the minimum drift-gradient when every new instance arrives. SEGA has been validated by extensive experiments on both synthetic and real-world, classification and regression data streams. The experimental results show that SEGA outperforms competitive blind and informed drift adaptation methods. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

10. Rollback Ensemble With Multiple Local Minima in Fine-Tuning Deep Learning Networks.

Author: Ro, Youngmin, Choi, Jongwon, Heo, Byeongho, and Choi, Jin Young
Subjects: *DEEP learning, *ARTIFICIAL neural networks, *IMAGE retrieval, *GENERATIVE adversarial networks, *INFORMATION retrieval
Abstract: Image retrieval is a challenging problem that requires learning generalized features enough to identify untrained classes, even with very few classwise training samples. In this article, to obtain generalized features further in learning retrieval data sets, we propose a novel fine-tuning method of pretrained deep networks. In the retrieval task, we discovered a phenomenon in which the loss reduction in fine-tuning deep networks is stagnated, even while weights are largely updated. To escape from the stagnated state, we propose a new fine-tuning strategy to roll back some of the weights to the pretrained values. The rollback scheme is observed to drive the learning path to a gentle basin that provides more generalized features than a sharp basin. In addition, we propose a multihead ensemble structure to create synergy among multiple local minima obtained by our rollback scheme. Experimental results show that the proposed learning method significantly improves generalization performance, achieving state-of-the-art performance on the Inshop and SOP data sets. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

11. Adversarial Binary Mutual Learning for Semi-Supervised Deep Hashing.

Author: Wang, Guan'An, Hu, Qinghao, Yang, Yang, Cheng, Jian, and Hou, Zeng-Guang
Subjects: *DEEP learning, *HAMMING distance, *SUPERVISED learning, *WEIBULL distribution, *SEARCH algorithms, *KNOWLEDGE transfer, *BINARY codes
Abstract: Hashing is a popular search algorithm for its compact binary representation and efficient Hamming distance calculation. Benefited from the advance of deep learning, deep hashing methods have achieved promising performance. However, those methods usually learn with expensive labeled data but fail to utilize unlabeled data. Furthermore, the traditional pairwise loss used by those methods cannot explicitly force similar/dissimilar pairs to small/large distances. Both weaknesses limit existing methods’ performance. To solve the first problem, we propose a novel semi-supervised deep hashing model named adversarial binary mutual learning (ABML). Specifically, our ABML consists of a generative model $G_{H}$ and a discriminative model $D_{H}$ , where $D_{H}$ learns labeled data in a supervised way and $G_{H}$ learns unlabeled data by synthesizing real images. We adopt an adversarial learning (AL) strategy to transfer the knowledge of unlabeled data to $D_{H}$ by making $G_{H}$ and $D_{H}$ mutually learn from each other. To solve the second problem, we propose a novel Weibull cross-entropy loss (WCE) by using the Weibull distribution, which can distinguish tiny differences of distances and explicitly force similar/dissimilar distances as small/large as possible. Thus, the learned features are more discriminative. Finally, by incorporating ABML with WCE loss, our model can acquire more semantic and discriminative features. Extensive experiments on four common data sets (CIFAR-10, large database of handwritten digits (MNIST), ImageNet-10, and NUS-WIDE) and a large-scale data set ImageNet demonstrate that our approach successfully overcomes the two difficulties above and significantly outperforms state-of-the-art hashing methods. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

12. Global Negative Correlation Learning: A Unified Framework for Global Optimization of Ensemble Models.

Author: Perales-Gonzalez, Carlos, Fernandez-Navarro, Francisco, Carbonero-Ruz, Mariano, and Perez-Rodriguez, Javier
Subjects: *GLOBAL optimization, *ERROR functions, *ANALYTICAL solutions, *MACHINE learning, *LEARNING communities, *RADIAL basis functions
Abstract: Ensembles are a widely implemented approach in the machine learning community and their success is traditionally attributed to the diversity within the ensemble. Most of these approaches foster diversity in the ensemble by data sampling or by modifying the structure of the constituent models. Despite this, there is a family of ensemble models in which diversity is explicitly promoted in the error function of the individuals. The negative correlation learning (NCL) ensemble framework is probably the most well-known algorithm within this group of methods. This article analyzes NCL and reveals that the framework actually minimizes the combination of errors of the individuals of the ensemble instead of minimizing the residuals of the final ensemble. We propose a novel ensemble framework, named global negative correlation learning (GNCL), which focuses on the optimization of the global ensemble instead of the individual fitness of its components. An analytical solution for the parameters of base regressors based on the NCL framework and the global error function proposed is also provided under the assumption of fixed basis functions (although the general framework could also be instantiated for neural networks with nonfixed basis functions). The proposed ensemble framework is evaluated by extensive experiments with regression and classification data sets. Comparisons with other state-of-the-art ensemble methods confirm that GNCL yields the best overall performance. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

13. Imbalanced Data Classification via Cooperative Interaction Between Classifier and Generator.

Author: Choi, Hyun-Soo, Jung, Dahuin, Kim, Siwon, and Yoon, Sungroh
Subjects: *GENERATIVE adversarial networks, *FOLKSONOMIES, *DEEP learning, *GALLIUM nitride
Abstract: Learning classifiers with imbalanced data can be strongly biased toward the majority class. To address this issue, several methods have been proposed using generative adversarial networks (GANs). Existing GAN-based methods, however, do not effectively utilize the relationship between a classifier and a generator. This article proposes a novel three-player structure consisting of a discriminator, a generator, and a classifier, along with decision boundary regularization. Our method is distinctive in which the generator is trained in cooperation with the classifier to provide minority samples that gradually expand the minority decision region, improving performance for imbalanced data classification. The proposed method outperforms the existing methods on real data sets as well as synthetic imbalanced data sets. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

14. Transductive Semisupervised Deep Hashing.

Author: Shi, Weiwei, Gong, Yihong, Chen, Badong, and Hei, Xinhong
Subjects: *CONVOLUTIONAL neural networks, *ARTIFICIAL neural networks, *HAMMING distance, *IMAGE retrieval, *COMPUTER programming education
Abstract: Deep hashing methods have shown their superiority to traditional ones. However, they usually require a large amount of labeled training data for achieving high retrieval accuracies. We propose a novel transductive semisupervised deep hashing (TSSDH) method which is effective to train deep convolutional neural network (DCNN) models with both labeled and unlabeled training samples. TSSDH method consists of the following four main ingredients. First, we extend the traditional transductive learning (TL) principle to make it applicable to DCNN-based deep hashing. Second, we introduce confidence levels for unlabeled samples to reduce adverse effects from uncertain samples. Third, we employ a Gaussian likelihood loss for hash code learning to sufficiently penalize large Hamming distances for similar sample pairs. Fourth, we design the large-margin feature (LMF) regularization to make the learned features satisfy that the distances of similar sample pairs are minimized and the distances of dissimilar sample pairs are larger than a predefined margin. Comprehensive experiments show that the TSSDH method can produce superior image retrieval accuracies compared to the representative semisupervised deep hashing methods under the same number of labeled training samples. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

15. AI Takes a Dumpster Dive: Computer-vision systems sort your recyclables at superhuman speed.

Subjects: *COMPUTER vision, *SPEED, *YOGURT, *ENVIRONMENTAL management, *CONTAINERS
Abstract: It's Tuesday night. In front of your house sits a large blue bin, full of newspaper, cardboard, bottles, cans, foil take-out trays, and empty yogurt containers. You may feel virtuous, thinking you're doing your part to reduce waste. But after you rinse out that yogurt container and toss it into the bin, you probably don't think much about it ever again. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

16. Attract–Repel Encoder: Learning Anomaly Representation Away From Landmarks.

Author: Zhao, Jiachen, Deng, Fang, Li, Yongling, and Chen, Jie
Subjects: *DEEP learning, *ANOMALY detection (Computer security), *DATA mining
Abstract: Anomaly detection (AD) has attracted great interest in the data mining community. With the development of deep learning, various deep autoencoders have been used and modified to solve AD problems due to their efficient data coding and reconstruction mechanisms. However, such methods still suffer challenges when solving some practical AD tasks. On the one hand, an AD dataset may contain diverse normal patterns rather than a universal pattern. Specifically, the normal data usually distribute in multiple clusters; meanwhile, the exact number of clusters is hard to know in practice. On the other hand, most existing autoencoder-based methods focus on encoding normal features but have not considered exploring the characteristics of abnormal data. To tackle these challenges, this article proposes a novel autoencoder-based AD model, the attract–repel encoder (ARE). ARE selects some landmarks in the encoding space to represent the diverse normal patterns. Besides, ARE can adaptively update the landmarks and their quantity during training. Then this article proposes the attract–repel loss (AR loss) function to train ARE. AR loss attracts normal samples to landmarks and repels anomalies away from landmarks so that it can learn both normal and abnormal features. Finally, ARE computes a sample’s anomaly score by summing up its reconstruction error and its distance to the landmarks. Moreover, ARE can be trained either semisupervised or unsupervised. This article presents comprehensive experiments to evaluate the effectiveness of our approach. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

17. Triple-Memory Networks: A Brain-Inspired Method for Continual Learning.

Author: Wang, Liyuan, Lei, Bo, Li, Qian, Su, Hang, Zhu, Jun, and Zhong, Yi
Subjects: *GENERATIVE adversarial networks, *ARTIFICIAL neural networks, *BIOLOGICAL neural networks
Abstract: Continual acquisition of novel experience without interfering with previously learned knowledge, i.e., continual learning, is critical for artificial neural networks, while limited by catastrophic forgetting. A neural network adjusts its parameters when learning a new task but then fails to conduct the old tasks well. By contrast, the biological brain can effectively address catastrophic forgetting through consolidating memories as more specific or more generalized forms to complement each other, which is achieved in the interplay of the hippocampus and neocortex, mediated by the prefrontal cortex. Inspired by such a brain strategy, we propose a novel approach named triple-memory networks (TMNs) for continual learning. TMNs model the interplay of the three brain regions as a triple-network architecture of generative adversarial networks (GANs). The input information is encoded as specific representations of data distributions in a generator, or generalized knowledge of solving tasks in a discriminator and a classifier, with implementing appropriate brain-inspired algorithms to alleviate catastrophic forgetting in each module. Particularly, the generator replays generated data of the learned tasks to the discriminator and the classifier, both of which are implemented with a weight consolidation regularizer to complement the lost information in the generation process. TMNs achieve the state-of-the-art performance of generative memory replay on a variety of class-incremental learning benchmarks on MNIST, SVHN, CIFAR-10, and ImageNet-50. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

18. Deep Echo State Q-Network (DEQN) and Its Application in Dynamic Spectrum Sharing for 5G and Beyond.

Author: Chang, Hao-Hsuan, Liu, Lingjia, and Yi, Yang
Subjects: *RECURRENT neural networks, *5G networks, *REINFORCEMENT learning, *SPECTRUM allocation, *ECHO, *WIRELESS sensor networks
Abstract: Deep reinforcement learning (DRL) has been shown to be successful in many application domains. Combining recurrent neural networks (RNNs) and DRL further enables DRL to be applicable in non-Markovian environments by capturing temporal information. However, training of both DRL and RNNs is known to be challenging requiring a large amount of training data to achieve convergence. In many targeted applications, such as those used in the fifth-generation (5G) cellular communication, the environment is highly dynamic, while the available training data is very limited. Therefore, it is extremely important to develop DRL strategies that are capable of capturing the temporal correlation of the dynamic environment requiring limited training overhead. In this article, we introduce the deep echo state Q-network (DEQN) that can adapt to the highly dynamic environment in a short period of time with limited training data. We evaluate the performance of the introduced DEQN method under the dynamic spectrum sharing (DSS) scenario, which is a promising technology in 5G and future 6G networks to increase the spectrum utilization. Compared with conventional spectrum management policy that grants a fixed spectrum band to a single system for exclusive access, DSS allows the secondary system to share the spectrum with the primary system. Our work sheds light on the application of an efficient DRL framework in highly dynamic environments with limited available training data. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

19. Multitask Representation Learning With Multiview Graph Convolutional Networks.

Author: Huang, Hong, Song, Yu, Wu, Yao, Shi, Jia, Xie, Xia, and Jin, Hai
Subjects: *TASK performance
Abstract: Link prediction and node classification are two important downstream tasks of network representation learning. Existing methods have achieved acceptable results but they perform these two tasks separately, which requires a lot of duplication of work and ignores the correlations between tasks. Besides, conventional models suffer from the identical treatment of information of multiple views, thus they fail to learn robust representation for downstream tasks. To this end, we tackle link prediction and node classification problems simultaneously via multitask multiview learning in this article. We first explain the feasibility and advantages of multitask multiview learning for these two tasks. Then we propose a novel model named MT-MVGCN to perform link prediction and node classification tasks simultaneously. More specifically, we design a multiview graph convolutional network to extract abundant information of multiple views in a network, which is shared by different tasks. We further apply two attention mechanisms: view the attention mechanism and task attention mechanism to make views and tasks adjust the view fusion process. Moreover, view reconstruction can be introduced as an auxiliary task to boost the performance of the proposed model. Experiments on real-world network data sets demonstrate that our model is efficient yet effective, and outperforms advanced baselines in these two tasks. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

20. Sparse ℓ 1 - and ℓ 2 -Center Classifiers.

Author: Calafiore, Giuseppe C. and Fracastoro, Giulia
Subjects: *FEATURE selection
Abstract: In this article, we discuss two novel sparse versions of the classical nearest-centroid classifier. The proposed sparse classifiers are based on $\ell _{1}$ and $\ell _{2}$ distance criteria, respectively, and perform simultaneous feature selection and classification, by detecting the features that are most relevant for the classification purpose. We formally prove that the training of the proposed sparse models, with both distance criteria, can be performed exactly (i.e., the globally optimal set of features is selected) at a linear computational cost. Especially, the proposed sparse classifiers are trained in $O(mn)+O(m\log k)$ operations, where $n$ is the number of samples, $m$ is the total number of features, and $k \leq m$ is the number of features to be retained in the classifier. Furthermore, the complexity of testing and classifying a new sample is simply $O(k)$ for both methods. The proposed models can be employed either as stand-alone sparse classifiers or fast feature-selection techniques for prefiltering the features to be later fed to other types of classifiers (e.g., SVMs). The experimental results show that the proposed methods are competitive in accuracy with state-of-the-art feature selection and classification techniques while having a substantially lower computational cost. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

21. Multilayer Spectral–Spatial Graphs for Label Noisy Robust Hyperspectral Image Classification.

Author: Jiang, Junjun, Ma, Jiayi, and Liu, Xianming
Subjects: *GRAPH labelings, *SOURCE code, *CLASSIFICATION, *NOISE measurement, *IMAGE segmentation, *FOOD labeling
Abstract: In hyperspectral image (HSI) analysis, label information is a scarce resource and it is unavoidably affected by human and nonhuman factors, resulting in a large amount of label noise. Although most of the recent supervised HSI classification methods have achieved good classification results, their performance drastically decreases when the training samples contain label noise. To address this issue, we propose a label noise cleansing method based on spectral–spatial graphs (SSGs). In particular, an affinity graph is constructed based on spectral and spatial similarity, in which pixels in a superpixel segmentation-based homogeneous region are connected, and their similarities are measured by spectral feature vectors. Then, we use the constructed affinity graph to regularize the process of label noise cleansing. In this manner, we transform label noise cleansing to an optimization problem with a graph constraint. To fully utilize spatial information, we further develop multiscale segmentation-based multilayer SSGs (MSSGs). It can efficiently merge the complementary information of multilayer graphs and thus provides richer spatial information compared with any single-layer graph obtained from isolation segmentation. Experimental results show that MSSG reduces the level of label noise. Compared with the state of the art, the proposed MSSG method exhibits significantly enhanced classification accuracy toward the training data with noisy labels. The significant advantages of the proposed method over four major classifiers are also demonstrated. The source code is available at https://github.com/junjun-jiang/MSSG. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

22. Knowledge Distillation for Face Photo–Sketch Synthesis.

Author: Zhu, Mingrui, Li, Jie, Wang, Nannan, and Gao, Xinbo
Subjects: *GENERATIVE adversarial networks, *CONVOLUTIONAL neural networks, *MACHINE learning, *KNOWLEDGE transfer, *GALLIUM nitride, *FACE
Abstract: Significant progress has been made with face photo–sketch synthesis in recent years due to the development of deep convolutional neural networks, particularly generative adversarial networks (GANs). However, the performance of existing methods is still limited because of the lack of training data (photo–sketch pairs). To address this challenge, we investigate the effect of knowledge distillation (KD) on training neural networks for the face photo–sketch synthesis task and propose an effective KD model to improve the performance of synthetic images. In particular, we utilize a teacher network trained on a large amount of data in a related task to separately learn knowledge of the face photo and knowledge of the face sketch and simultaneously transfer this knowledge to two student networks designed for the face photo–sketch synthesis task. In addition to assimilating the knowledge from the teacher network, the two student networks can mutually transfer their own knowledge to further enhance their learning. To further enhance the perception quality of the synthetic image, we propose a KD+ model that combines GANs with KD. The generator can produce images with more realistic textures and less noise under the guide of knowledge. Extensive experiments and a user study demonstrate the superiority of our models over the state-of-the-art methods. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

23. Topic-Based Instance and Feature Selection in Multilabel Classification.

Author: Ma, Jianghong and Chow, Tommy W. S.
Subjects: *FEATURE selection, *CLASSIFICATION, *FEATURE extraction
Abstract: Multilabel learning has been extensively studied in the past years, as it has many applications in different domains. It aims at annotating the labels for unseen data according to training data, which are often high dimensional in both instance and feature levels. The training data often have noisy and redundant information on these two levels. As an effective data preprocessing step, instance and feature selection should both be performed to find relevant training instances for each testing instance and relevant features for each label, respectively. However, most of the existing methods overlook the input–output correlation in each kind of selection. It will lead to the performance degradation. This article presents a formulation for multilabel learning from a topic view that exploits the dependence between features and labels in a topic space. We can perform effective instance and feature selection in the latent topic space, as the relationship between the input and output spaces is well captured in this space. The results from intensive experiments on various benchmarks demonstrate the effectiveness of the proposed framework. [ABSTRACT FROM AUTHOR] more...
Published: 2022
Full Text: View/download PDF

24. An Ensemble Broad Learning Scheme for Semisupervised Vehicle Type Classification.

Author: Guo, Li, Li, Runze, and Jiang, Bin
Subjects: *SUPERVISED learning, *INTELLIGENT transportation systems, *TRAFFIC monitoring, *TRAFFIC flow, *CLASSIFICATION, *AUTONOMOUS vehicles
Abstract: Nowadays vehicle type classification is a fundamental part of intelligent transportation systems (ITSs) and is widely used in various applications like traffic flow monitoring, security enforcement, and autonomous driving, etc. However, vehicle classification is usually used in supervised learning, which greatly limits the applicability for real ITS. This article proposes a semisupervised vehicle type classification scheme via ensemble broad learning for ITS. This presented method contains two main parts. In the first part, a collection of base broad learning system (BLS) classifiers is trained by semisupervised learning to avoid time-consuming training process and alleviate the increasingly unlabeled samples burden. In the second part, a dynamic ensemble structure constructed by trained classifier groups with different characteristics obtains the highest type probability and determine which the vehicle belongs, so as to achieve superior generalization performance than a single base classifier. Several experiments conducted on the pubic BIT-Vehicle dataset and MIO-TCD dataset demonstrate that the proposed method outperforms single BLS classifier and some mainstream methods on effectiveness and efficiency. [ABSTRACT FROM AUTHOR] more...
Published: 2021
Full Text: View/download PDF

25. Class-Variant Margin Normalized Softmax Loss for Deep Face Recognition.

Author: Zhang, Wanping, Chen, Yongru, Yang, Wenming, Wang, Guijin, Xue, Jing-Hao, and Liao, Qingmin
Subjects: *DEEP learning, *HUMAN facial recognition software, *COMPUTATIONAL complexity, *FEATURE extraction
Abstract: In deep face recognition, the commonly used softmax loss and its newly proposed variations are not yet sufficiently effective to handle the class imbalance and softmax saturation issues during the training process while extracting discriminative features. In this brief, to address both issues, we propose a class-variant margin (CVM) normalized softmax loss, by introducing a true-class margin and a false-class margin into the cosine space of the angle between the feature vector and the class-weight vector. The true-class margin alleviates the class imbalance problem, and the false-class margin postpones the early individual saturation of softmax. With negligible computational complexity increment during training, the new loss function is easy to implement in the common deep learning frameworks. Comprehensive experiments on the LFW, YTF, and MegaFace protocols demonstrate the effectiveness of the proposed CVM loss function. [ABSTRACT FROM AUTHOR] more...
Published: 2021
Full Text: View/download PDF

26. A Layer-Wise Data Augmentation Strategy for Deep Learning Networks and Its Soft Sensor Application in an Industrial Hydrocracking Process.

Author: Yuan, Xiaofeng, Ou, Chen, Wang, Yalin, Yang, Chunhua, and Gui, Weihua
Subjects: *DEEP learning, *DATA augmentation, *SENSOR networks, *MANUFACTURING processes, *BOILING-points, *ARTIFICIAL neural networks
Abstract: In industrial processes, inferential sensors have been extensively applied for prediction of quality variables that are difficult to measure online directly by hard sensors. Deep learning is a recently developed technique for feature representation of complex data, which has great potentials in soft sensor modeling. However, it often needs a large number of representative data to train and obtain a good deep network. Moreover, layer-wise pretraining often causes information loss and generalization degradation of high hidden layers. This greatly limits the implementation and application of deep learning networks in industrial processes. In this article, a layer-wise data augmentation (LWDA) strategy is proposed for the pretraining of deep learning networks and soft sensor modeling. In particular, the LWDA-based stacked autoencoder (LWDA-SAE) is developed in detail. Finally, the proposed LWDA-SAE model is applied to predict the 10% and 50% boiling points of the aviation kerosene in an industrial hydrocracking process. The results show that the LWDA-SAE-based soft sensor is superior to multilayer perceptron, traditional SAE, and the SAE with data augmentation only for its input layer (IDA-SAE). Moreover, LWDA-SAE can converge at a faster speed with a lower learning error than the other methods. [ABSTRACT FROM AUTHOR] more...
Published: 2021
Full Text: View/download PDF

27. Intelligent Trainer for Dyna-Style Model-Based Deep Reinforcement Learning.

Author: Dong, Linsen, Li, Yuanlong, Zhou, Xin, Wen, Yonggang, and Guan, Kyle
Subjects: *REINFORCEMENT learning, *DEEP learning, *SYSTEM dynamics, *MARKOV processes, *INSTRUCTIONAL systems, *SELF-tuning controllers
Abstract: Model-based reinforcement learning (MBRL) has been proposed as a promising alternative solution to tackle the high sampling cost challenge in the canonical RL, by leveraging a system dynamics model to generate synthetic data for policy training purpose. The MBRL framework, nevertheless, is inherently limited by the convoluted process of jointly optimizing control policy, learning system dynamics, and sampling data from two sources controlled by complicated hyperparameters. As such, the training process involves overwhelmingly manual tuning and is prohibitively costly. In this research, we propose a “reinforcement on reinforcement” (RoR) architecture to decompose the convoluted tasks into two decoupled layers of RL. The inner layer is the canonical MBRL training process which is formulated as a Markov decision process, called training process environment (TPE). The outer layer serves as an RL agent, called intelligent trainer, to learn an optimal hyperparameter configuration for the inner TPE. This decomposition approach provides much-needed flexibility to implement different trainer designs, referred to “train the trainer.” In our research, we propose and optimize two alternative trainer designs: 1) an unihead trainer and 2) a multihead trainer. Our proposed RoR framework is evaluated for five tasks in the OpenAI gym. Compared with three other baseline methods, our proposed intelligent trainer methods have a competitive performance in autotuning capability, with up to 56% expected sampling cost saving without knowing the best parameter configurations in advance. The proposed trainer framework can be easily extended to tasks that require costly hyperparameter tuning. [ABSTRACT FROM AUTHOR] more...
Published: 2021
Full Text: View/download PDF

28. Improved Linear Convergence of Training CNNs With Generalizability Guarantees: A One-Hidden-Layer Case.

Author: Zhang, Shuai, Wang, Meng, Xiong, Jinjun, Liu, Sijia, and Chen, Pin-Yu
Subjects: *CONVOLUTIONAL neural networks, *LEARNING problems, *GAUSSIAN distribution
Abstract: We analyze the learning problem of one-hidden-layer nonoverlapping convolutional neural networks with the rectified linear unit (ReLU) activation function from the perspective of model estimation. The training outputs are assumed to be generated by the neural network with the unknown ground-truth parameters plus some additive noise, and the objective is to estimate the model parameters by minimizing a nonconvex squared loss function of the training data. Assuming that the training set contains a finite number of samples generated from the Gaussian distribution, we prove that the accelerated gradient descent (GD) algorithm with a proper initialization converges to the ground-truth parameters (up to the noise level) with a linear rate even though the learning problem is nonconvex. Moreover, the convergence rate is proved to be faster than the vanilla GD. The initialization can be achieved by the existing tensor initialization method. In contrast to the existing works that assume an infinite number of samples, we theoretically establish the sample complexity of the required number of training samples. Although the neural network considered here is not deep, this is the first work to show that accelerated GD algorithms can find the global optimizer of the nonconvex learning problem of neural networks. This is also the first work that characterizes the sample complexity of gradient-based methods in learning convolutional neural networks with the nonsmooth ReLU activation function. This work also provides the tightest bound so far of the estimation error with respect to the output noise. [ABSTRACT FROM AUTHOR] more...
Published: 2021
Full Text: View/download PDF

29. Convolutional Neural Network With Developmental Memory for Continual Learning.

Author: Park, Gyeong-Moon, Yoo, Sahng-Min, and Kim, Jong-Hwan
Subjects: *CONVOLUTIONAL neural networks, *COMPUTER vision, *ARTIFICIAL neural networks, *BIOLOGICAL neural networks
Abstract: Convolutional neural networks (CNNs) are one of the most successful deep neural networks. Indeed, most of the recent applications related to computer vision are based on CNNs. However, when learning new tasks in a sequential manner, CNNs face catastrophic forgetting: they forget a considerable amount of previously learned tasks while adapting to novel tasks. To overcome this main barrier to continual learning with CNNs, we introduce developmental memory (DM) into a CNN, continually generating submemory networks to learn important features of individual tasks. A novel training method, referred to here as guided learning (GL), guides the newly generated submemory to become an expert on the new task, eventually improving the performance of the overall network. At the same time, the existing submemories attempt to preserve the knowledge of old tasks. Experiments on image classification tasks show that compared with the state-of-the-art algorithms, the proposed CNN with DM not only improves the classification performance on the new image task but also leads to less forgetting of previous image tasks to facilitate continual learning. [ABSTRACT FROM AUTHOR] more...
Published: 2021
Full Text: View/download PDF

30. Robust Few-Shot Learning for User-Provided Data.

Author: Lu, Jiang, Jin, Sheng, Liang, Jian, and Zhang, Changshui
Subjects: *MACHINE learning
Abstract: Few-shot learning (FSL) focuses on distilling transferrable knowledge from existing experience to cope with novel concepts for which the labeled data are scarce. A typical assumption in FSL is that the training examples of novel classes are all clean with no outlier interference. In many realistic applications where examples are provided by users, however, data are potentially noisy or unreadable. In this context, we introduce a novel research topic, robust FSL (RFSL), where we aim to address two types of outliers within user-provided data: the representation outlier (RO) and the label outlier (LO). Moreover, we introduce a metric for estimating robustness and use it to investigate the performance of several advanced methods to FSL when faced with user-provided outliers. In addition, we propose robust attentive profile networks (RapNets) to achieve outlier suppression. The results of a comprehensive evaluation of benchmark data sets demonstrate the shortcomings of current FSL methods and the superiority of the proposed RapNets when dealing with RFSL problems, establishing a benchmark for follow-up studies. [ABSTRACT FROM AUTHOR] more...
Published: 2021
Full Text: View/download PDF

31. Self-Paced Clustering Ensemble.

Author: Zhou, Peng, Du, Liang, Liu, Xinwang, Shen, Yi-Dong, Fan, Mingyu, and Li, Xuejun
Subjects: *MACHINE learning, *LINEAR programming, *COMPUTER science
Abstract: The clustering ensemble has emerged as an important extension of the classical clustering problem. It provides an elegant framework to integrate multiple weak base clusterings to generate a strong consensus result. Most existing clustering ensemble methods usually exploit all data to learn a consensus clustering result, which does not sufficiently consider the adverse effects caused by some difficult instances. To handle this problem, we propose a novel self-paced clustering ensemble (SPCE) method, which gradually involves instances from easy to difficult ones into the ensemble learning. In our method, we integrate the evaluation of the difficulty of instances and ensemble learning into a unified framework, which can automatically estimate the difficulty of instances and ensemble the base clusterings. To optimize the corresponding objective function, we propose a joint learning algorithm to obtain the final consensus clustering result. Experimental results on benchmark data sets demonstrate the effectiveness of our method. [ABSTRACT FROM AUTHOR] more...
Published: 2021
Full Text: View/download PDF

32. Hardware-Algorithm Co-Design of a Compressed Fuzzy Active Learning Method.

Author: Jokar, Ehsan, Klidbary, Sajad Haghzad, Abolfathi, Hadis, Shouraki, Saeed Bagheri, Zand, Ramtin, and Ahmadi, Arash
Subjects: *ALGORITHMS, *MISO, *COMPUTER systems, *SOFT computing, *HARDWARE
Abstract: Active learning method (ALM) is a powerful fuzzy–based soft computing methodology suitable for various applications such as function modeling, control systems, clustering and classification. Despite considerable advantages, the main computational engine of ALM, ink drop spread (IDS), is memory-intensive, which imposes significant area overheads in the hardware realization of the ALM for real–time applications. In this paper, we propose a compressed model for ALM which greatly alleviates the storage limitations. The proposed approach employs a distinct inference algorithm, enabling a significant reduction in memory utilization from $O(N^{2})$ to $O(2N)$ for a multi–input single–output (MISO) system. Also, the computational costs in both training and inference modes are decreased to only a few additions and multiplications. Furthermore, we develop a memory–efficient digital architecture for the proposed compressed ALM algorithm that can be leveraged for various computing systems through configuring a few registers. Finally, we assess the performance of the proposed approach using various function modeling and classification applications and provide a comparison with conventional ALM and some other well-know approaches. Simulation and hardware implementation results demonstrate that the proposed approach achieves reduced noise sensitivity with $128\times $ reduction in the average memory usage while realizing comparable accuracy compared to the other approaches studied herein. [ABSTRACT FROM AUTHOR] more...
Published: 2020
Full Text: View/download PDF

33. Learning Salient and Discriminative Descriptor for Palmprint Feature Extraction and Identification.

Author: Zhao, Shuping and Zhang, Bob
Subjects: *PALMPRINT recognition, *FEATURE extraction, *LEAST squares, *MATRIX decomposition, *MOTIVATION (Psychology)
Abstract: Palmprint recognition has been widely applied in security and, particularly, authentication. In the past decade, various palmprint recognition methods have been proposed and achieved promising recognition performance. However, most of these methods require rich a priori knowledge and cannot adapt well to different palmprint recognition scenarios, including contact-based, contactless, and multispectral palmprint recognition. This problem limits the application and popularization of palmprint recognition. In this article, motivated by the least square regression, we propose a salient and discriminative descriptor learning method (SDDLM) for general scenario palmprint recognition. Different from the conventional palmprint feature extraction methods, the SDDLM jointly learns noise and salient information from the pixels of palmprint images, simultaneously. The learned noise enforces the projection matrix to learn salient and discriminative features from each palmprint sample. Thus, the SDDLM can be adaptive to multiscenarios. Experiments were conducted on the IITD, CASIA, GPDS, PolyU near infrared (NIR), noisy IITD, and noisy GPDS palmprint databases, and palm vein and dorsal hand vein databases. It can be seen from the experimental results that the proposed SDDLM consistently outperformed the classical palmprint recognition methods and state-of-the-art methods for palmprint recognition. [ABSTRACT FROM AUTHOR] more...
Published: 2020
Full Text: View/download PDF

34. Feedback Linearization Based on Gaussian Processes With Event-Triggered Online Learning.

Author: Umlauft, Jonas and Hirche, Sandra
Subjects: *GAUSSIAN processes, *ONLINE education, *SYSTEM identification, *AUTOREGRESSIVE models, *COMPUTATIONAL complexity, *GAUSSIAN mixture models
Abstract: Combining control engineering with nonparametric modeling techniques from machine learning allows for the control of systems without analytic description using data-driven models. Most of the existing approaches separate learning, i.e., the system identification based on a fixed dataset, and control, i.e., the execution of the model-based control law. This separation makes the performance highly sensitive to the initial selection of training data and possibly requires very large datasets. This article proposes a learning feedback linearizing control law using online closed-loop identification. The employed Gaussian process model updates its training data only if the model uncertainty becomes too large. This event-triggered online learning ensures high data efficiency and thereby reduces computational complexity, which is a major barrier for using Gaussian processes under real-time constraints. We propose safe forgetting strategies of data points to adhere to budget constraints and to further increase data efficiency. We show asymptotic stability for the tracking error under the proposed event-triggering law and illustrate the effective identification and control in simulation. [ABSTRACT FROM AUTHOR] more...
Published: 2020
Full Text: View/download PDF

35. A Network Framework for Small-Sample Learning.

Author: Liu, Dongbo, He, Zhenan, Chen, Dongdong, and Lv, Jiancheng
Subjects: *BOLTZMANN machine, *ARTIFICIAL neural networks, *NETWORK performance
Abstract: Small-sample learning involves training a neural network on a small-sample data set. An expansion of the training set is a common way to improve the performance of neural networks in small-sample learning tasks. However, improper constraints in expanding training data will reduce the performance of the neural networks. In this article, we present certain conditions for incorporation of additional training data. According to these conditions, we propose a neural network framework for self-training using self-generated data called small-sample learning network (SSLN). The SSLN consists of two parts: the expression learning network and the sample recall generative network, both of which are constructed based on restricted Boltzmann machine (RBM). We show that this SSLN can converge as well as the RBM. Moreover, the experiment results on MNIST Digit, SVHN, CIFAR10, and STL-10 data sets reveal the superiority of the SSLN over other models. [ABSTRACT FROM AUTHOR] more...
Published: 2020
Full Text: View/download PDF

36. Information Losses in Neural Classifiers From Sampling.

Author: Foggo, Brandon, Yu, Nanpeng, Shi, Jie, and Gao, Yuanqi
Subjects: *LARGE deviation theory, *STATISTICAL learning, *INFORMATION modeling, *INFORMATION theory
Abstract: This article considers the subject of information losses arising from the finite data sets used in the training of neural classifiers. It proves a relationship between such losses as the product of the expected total variation of the estimated neural model with the information about the feature space contained in the hidden representation of that model. It then bounds this expected total variation as a function of the size of randomly sampled data sets in a fairly general setting, and without bringing in any additional dependence on model complexity. It ultimately obtains bounds on information losses that are less sensitive to input compression and in general much smaller than existing bounds. This article then uses these bounds to explain some recent experimental findings of information compression in neural networks that cannot be explained by previous work. Finally, this article shows that not only are these bounds much smaller than existing ones, but they also correspond well with experiments. [ABSTRACT FROM AUTHOR] more...
Published: 2020
Full Text: View/download PDF

37. Robust and Communication-Efficient Federated Learning From Non-i.i.d. Data.

Author: Sattler, Felix, Wiedemann, Simon, Muller, Klaus-Robert, and Samek, Wojciech
Subjects: *INFORMATION commons, *CLASSROOM environment, *COLLABORATIVE learning, *IMAGE compression, *PROFESSIONAL-client communication, *DATA distribution, *DEEP learning, *DATA compression
Abstract: Federated learning allows multiple parties to jointly train a deep learning model on their combined data, without any of the participants having to reveal their local data to a centralized server. This form of privacy-preserving collaborative learning, however, comes at the cost of a significant communication overhead during training. To address this problem, several compression methods have been proposed in the distributed training literature that can reduce the amount of required communication by up to three orders of magnitude. These existing methods, however, are only of limited utility in the federated learning setting, as they either only compress the upstream communication from the clients to the server (leaving the downstream communication uncompressed) or only perform well under idealized conditions, such as i.i.d. distribution of the client data, which typically cannot be found in federated learning. In this article, we propose sparse ternary compression (STC), a new compression framework that is specifically designed to meet the requirements of the federated learning environment. STC extends the existing compression technique of top- $k$ gradient sparsification with a novel mechanism to enable downstream compression as well as ternarization and optimal Golomb encoding of the weight updates. Our experiments on four different learning tasks demonstrate that STC distinctively outperforms federated averaging in common federated learning scenarios. These results advocate for a paradigm shift in federated optimization toward high-frequency low-bitwidth communication, in particular in the bandwidth-constrained learning environments. [ABSTRACT FROM AUTHOR] more...
Published: 2020
Full Text: View/download PDF

38. Exact Passive-Aggressive Algorithms for Ordinal Regression Using Interval Labels.

Author: Manwani, Naresh and Chandra, Mohit
Subjects: *ONLINE algorithms, *ALGORITHMS, *LABELS, *CONVEX functions
Abstract: In this article, we propose exact passive-aggressive (PA) online algorithms for ordinal regression. The proposed algorithms can be used even when we have interval labels instead of actual labels for example. The proposed algorithms solve a convex optimization problem at every trial. We find an exact solution to those optimization problems to determine the updated parameters. We propose a support class algorithm (SCA) that finds the active constraints using the Karush–Kuhn–Tucker (KKT) conditions of the optimization problems. These active constraints form a support set, which determines the set of thresholds that need to be updated. We derive update rules for PA, PA-I, and PA-II. We show that the proposed algorithms maintain the ordering of the thresholds after every trial. We provide the mistake bounds of the proposed algorithms in both ideal and general settings. We also show experimentally that the proposed algorithms successfully learn accurate classifiers using interval labels as well as exact labels. The proposed algorithms also do well compared to other approaches. [ABSTRACT FROM AUTHOR] more...
Published: 2020
Full Text: View/download PDF

39. A Semisupervised Recurrent Convolutional Attention Model for Human Activity Recognition.

Author: Chen, Kaixuan, Yao, Lina, Zhang, Dalin, Wang, Xianzhi, Chang, Xiaojun, and Nie, Feiping
Subjects: *HUMAN activity recognition, *RECURRENT neural networks, *DEEP learning, *DATA distribution
Abstract: Recent years have witnessed the success of deep learning methods in human activity recognition (HAR). The longstanding shortage of labeled activity data inherently calls for a plethora of semisupervised learning methods, and one of the most challenging and common issues with semisupervised learning is the imbalanced distribution of labeled data over classes. Although the problem has long existed in broad real-world HAR applications, it is rarely explored in the literature. In this paper, we propose a semisupervised deep model for imbalanced activity recognition from multimodal wearable sensory data. We aim to address not only the challenges of multimodal sensor data (e.g., interperson variability and interclass similarity) but also the limited labeled data and class-imbalance issues simultaneously. In particular, we propose a pattern-balanced semisupervised framework to extract and preserve diverse latent patterns of activities. Furthermore, we exploit the independence of multi-modalities of sensory data and attentively identify salient regions that are indicative of human activities from inputs by our recurrent convolutional attention networks. Our experimental results demonstrate that the proposed model achieves a competitive performance compared to a multitude of state-of-the-art methods, both semisupervised and supervised ones, with 10% labeled training data. The results also show the robustness of our method over imbalanced, small training data sets. [ABSTRACT FROM AUTHOR] more...
Published: 2020
Full Text: View/download PDF

40. A Training Data Set Cleaning Method by Classification Ability Ranking for the $k$ -Nearest Neighbor Classifier.

Author: Wang, Yidi, Pan, Zhibin, and Pan, Yiwei
Subjects: *DATA scrubbing, *CLASSIFICATION, *ERROR rates, *COMPUTATIONAL complexity, *MYOELECTRIC prosthesis
Abstract: The $k$ -nearest neighbor (KNN) rule is a successful technique in pattern classification due to its simplicity and effectiveness. As a supervised classifier, KNN classification performance usually suffers from low-quality samples in the training data set. Thus, training data set cleaning (TDC) methods are needed for enhancing the classification accuracy by cleaning out noisy, or even wrong, samples in the original training data set. In this paper, we propose a classification ability ranking (CAR)-based TDC method to improve the performance of a KNN classifier, namely CAR-based TDC method. The proposed classification ability function ranks a training sample in terms of its contribution to correctly classify other training samples as a KNN through the leave-one-out (LV1) strategy in the cleaning stage. The training sample that likely misclassifies the other samples during the KNN classifications according to the LV1 strategy is considered to have lower classification ability and will be cleaned out from the original training data set. Extensive experiments, based on ten real-world data sets, show that the proposed CAR-based TDC method can significantly reduce the classification error rates of KNN-based classifiers, while reducing computational complexity thanks to a smaller cleaned training data set. [ABSTRACT FROM AUTHOR] more...
Published: 2020
Full Text: View/download PDF

41. Deep Decision Tree Transfer Boosting.

Author: Jiang, Shuhui, Mao, Haiyi, Ding, Zhengming, and Fu, Yun
Subjects: *DECISION trees, *DATA distribution, *LABELS, *TASK analysis
Abstract: Instance transfer approaches consider source and target data together during the training process, and borrow examples from the source domain to augment the training data, when there is limited or no label in the target domain. Among them, boosting-based transfer learning methods (e.g., TrAdaBoost) are most widely used. When dealing with more complex data, we may consider the more complex hypotheses (e.g., a decision tree with deeper layers). However, with the fixed and high complexity of the hypotheses, TrAdaBoost and its variants may face the overfitting problems. Even worse, in the transfer learning scenario, a decision tree with deep layers may overfit different distribution data in the source domain. In this paper, we propose a new instance transfer learning method, i.e., Deep Decision Tree Transfer Boosting (DTrBoost), whose weights are learned and assigned to base learners by minimizing the data-dependent learning bounds across both source and target domains in terms of the Rademacher complexities. This guarantees that we can learn decision trees with deep layers without overfitting. The theorem proof and experimental results indicate the effectiveness of our proposed method. [ABSTRACT FROM AUTHOR] more...
Published: 2020
Full Text: View/download PDF

42. Large-Margin Label-Calibrated Support Vector Machines for Positive and Unlabeled Learning.

Author: Gong, Chen, Liu, Tongliang, Yang, Jian, and Tao, Dacheng
Subjects: *SUPPORT vector machines, *DATA distribution, *LEARNING problems
Abstract: Positive and unlabeled learning (PU learning) aims to train a binary classifier based on only PU data. Existing methods usually cast PU learning as a label noise learning problem or a cost-sensitive learning problem. However, none of them fully take the data distribution information into consideration when designing the model, which hinders them from acquiring more encouraging performance. In this paper, we argue that the clusters formed by positive examples and potential negative examples in the feature space should be critically utilized to establish the PU learning model, especially when the negative data are not explicitly available. To this end, we introduce a hat loss to discover the margin between data clusters, a label calibration regularizer to amend the biased decision boundary to the potentially correct one, and propose a novel discriminative PU classifier termed “Large-margin Label-calibrated Support Vector Machines” (LLSVM). Our LLSVM classifier can work properly in the absence of negative training examples and effectively achieve the max-margin effect between positive and negative classes. Theoretically, we derived the generalization error bound of LLSVM which reveals that the introduction of PU data does help to enhance the algorithm performance. Empirically, we compared LLSVM with state-of-the-art PU methods on various synthetic and practical data sets, and the results confirm that the proposed LLSVM is more effective than other compared methods on dealing with PU learning tasks. [ABSTRACT FROM AUTHOR] more...
Published: 2019
Full Text: View/download PDF

43. Semisupervised Discriminant Multimanifold Analysis for Action Recognition.

Author: Xu, Zengmin, Hu, Ruimin, Chen, Jun, Chen, Chen, Jiang, Junjun, Li, Jiaofen, and Li, Hongyang
Subjects: *DISCRIMINANT analysis, *SUBSPACES (Mathematics), *CONJUGATE gradient methods, *MATRIX inversion, *DATA distribution, *HUMAN behavior, *DEEP learning, *HUMAN activity recognition, *DATA structures
Abstract: Although recent semisupervised approaches have proven their effectiveness when there are limited training data, they assume that the samples from different actions lie on a single data manifold in the feature space and try to uncover a common subspace for all samples. However, this assumption ignores the intraclass compactness and the interclass separability simultaneously. We believe that human actions should occupy multimanifold subspace and, therefore, model the samples of the same action as the same manifold and those of different actions as different manifolds. In order to obtain the optimum subspace projection matrix, the current approaches may be mathematically imprecise owe to the badly scaled matrix and improper convergence. To address these issues in unconstrained convex optimization, we introduce a nontrivial spectral projected gradient method and Karush–Kuhn–Tucker conditions without matrix inversion. Through maximizing the separability between different classes by using labeled data points and estimating the intrinsic geometric structure of the data distributions by exploring unlabeled data points, the proposed algorithm can learn global and local consistency and boost the recognition performance. Extensive experiments conducted on the realistic video data sets, including JHMDB, HMDB51, UCF50, and UCF101, have demonstrated that our algorithm outperforms the compared algorithms, including deep learning approach when there are only a few labeled samples. [ABSTRACT FROM AUTHOR] more...
Published: 2019
Full Text: View/download PDF

44. Deep Latent Low-Rank Representation for Face Sketch Synthesis.

Author: Zhang, Mingjin, Wang, Nannan, Li, Yunsong, and Gao, Xinbo
Subjects: *DRAWING, *FACE, *HAIRSTYLES
Abstract: Face sketch synthesis is useful and profitable in digital entertainment. Most existing face sketch synthesis methods rely on the assumption that facial photographs/sketches form a low-dimensional manifold. Once the training data are insufficient, the manifold could not characterize the identity-specific information that is included in a test photograph but excluded in the training data. Thus, the synthesized sketch would lose this information, such as glasses, earrings, hairstyles, and hairpins. To provide the sufficient data and satisfy the assumption on manifold, we propose a novel face sketch synthesis framework based on deep latent low-rank representation (DLLRR) in this paper. The DLLRR induces the hidden training sketches with the identity-specific information as the hidden data to the insufficient original training sketches as the observed data. And it searches the lowest rank representation on the candidates of a test photograph from the both hidden and observed data. For the strong representational capability of the coupled autoencoder, we leverage it to reveal the hidden data. Experiment results on face photograph–sketch database illustrate that the proposed method can successfully provide the sufficient training data with the identity-specific information. And compared to the state of the arts, the proposed method synthesizes more clean and vivid face sketches. [ABSTRACT FROM AUTHOR] more...
Published: 2019
Full Text: View/download PDF

45. Learning Aggregated Transmission Propagation Networks for Haze Removal and Beyond.

Author: Liu, Risheng, Fan, Xin, Hou, Minjun, Jiang, Zhiying, Luo, Zhongxuan, and Zhang, Lei
Subjects: *HAZE, *IMAGE intensifiers, *IMAGE color analysis, *MATHEMATICAL programming, *TASK performance, *PLANT propagation
Abstract: Single-image dehazing is an important low-level vision task with many applications. Early studies have investigated different kinds of visual priors to address this problem. However, they may fail when their assumptions are not valid on specific images. Recent deep networks also achieve a relatively good performance in this task. But unfortunately, due to the disappreciation of rich physical rules in hazes, a large amount of data are required for their training. More importantly, they may still fail when there exist completely different haze distributions in testing images. By considering the collaborations of these two perspectives, this paper designs a novel residual architecture to aggregate both prior (i.e., domain knowledge) and data (i.e., haze distribution) information to propagate transmissions for scene radiance estimation. We further present a variational energy-based perspective to investigate the intrinsic propagation behavior of our aggregated deep model. In this way, we actually bridge the gap between prior-driven models and data-driven networks and leverage advantages but avoid limitations of previous dehazing approaches. A lightweight learning framework is proposed to train our propagation network. Finally, by introducing a task-aware image separation formulation with a flexible optimization scheme, we extend the proposed model for more challenging vision tasks, such as underwater image enhancement and single-image rain removal. Experiments on both synthetic and real-world images demonstrate the effectiveness and efficiency of the proposed framework. [ABSTRACT FROM AUTHOR] more...
Published: 2019
Full Text: View/download PDF

46. Learning With Annotation of Various Degrees.

Author: Zhou, Joey Tianyi, Fang, Meng, Zhang, Hao, Gong, Chen, Peng, Xi, Cao, Zhiguo, and Goh, Rick Siow Mong
Subjects: *RANDOM fields, *SUPERVISED learning, *ANNOTATIONS, *LABELS, *DEEP learning, *DRUG labeling, *MARKOV processes
Abstract: In this paper, we study a new problem in the scenario of sequences labeling. To be exact, we consider that the training data are with annotation of various degrees, namely, fully labeled, unlabeled, and partially labeled sequences. The learning with fully un/labeled sequence refers to the standard setting in traditional un/supervised learning, and the proposed partially labeling specifies the subject that the element does not belong to. The partially labeled data are cheaper to obtain compared with the fully labeled data though it is less informative, especially when the tasks require a lot of domain knowledge. To solve such a practical challenge, we propose a novel deep conditional random field (CRF) model which utilizes an end-to-end learning manner to smoothly handle fully/un/partially labeled sequences within a unified framework. To the best of our knowledge, this could be one of the first works to utilize the partially labeled instance for sequence labeling, and the proposed algorithm unifies the deep learning and CRF in an end-to-end framework. Extensive experiments show that our method achieves state-of-the-art performance in two sequence labeling tasks on some popular data sets. [ABSTRACT FROM AUTHOR] more...
Published: 2019
Full Text: View/download PDF

47. Pool-Based Sequential Active Learning for Regression.

Author: Wu, Dongrui
Subjects: *MACHINE learning, *SEQUENTIAL learning, *DRUG labeling
Abstract: Active learning (AL) is a machine-learning approach for reducing the data labeling effort. Given a pool of unlabeled samples, it tries to select the most useful ones to label so that a model built from them can achieve the best possible performance. This paper focuses on pool-based sequential AL for regression (ALR). We first propose three essential criteria that an ALR approach should consider in selecting the most useful unlabeled samples: informativeness, representativeness, and diversity, and compare four existing ALR approaches against them. We then propose a new ALR approach using passive sampling, which considers both the representativeness and the diversity in both the initialization and subsequent iterations. Remarkably, this approach can also be integrated with other existing ALR approaches in the literature to further improve the performance. Extensive experiments on 11 University of California, Irvine, Carnegie Mellon University StatLib, and University of Florida Media Core data sets from various domains verified the effectiveness of our proposed ALR approaches. [ABSTRACT FROM AUTHOR] more...
Published: 2019
Full Text: View/download PDF

48. Neural-Response-Based Extreme Learning Machine for Image Classification.

Author: Li, Hongfeng, Zhao, Hongkai, and Li, Hong
Subjects: *FEEDFORWARD neural networks, *MATHEMATICAL regularization, *FEATURE extraction
Abstract: This paper proposes a novel and simple multilayer feature learning method for image classification by employing the extreme learning machine (ELM). The proposed algorithm is composed of two stages: the multilayer ELM (ML-ELM) feature mapping stage and the ELM learning stage. The ML-ELM feature mapping stage is recursively built by alternating between feature map construction and maximum pooling operation. In particular, the input weights for constructing feature maps are randomly generated and hence need not be trained or tuned, which makes the algorithm highly efficient. Moreover, the maximum pooling operation enables the algorithm to be invariant to certain transformations. During the ELM learning stage, elastic-net regularization is proposed to learn the output weight. Elastic-net regularization helps to learn more compact and meaningful output weight. In addition, we preprocess the input data with the dense scale-invariant feature transform operation to improve both the robustness and invariance of the algorithm. To evaluate the effectiveness of the proposed method, several experiments are conducted on three challenging databases. Compared with the conventional deep learning methods and other related ones, the proposed method achieves the best classification results with high computational efficiency. [ABSTRACT FROM AUTHOR] more...
Published: 2019
Full Text: View/download PDF

49. Blind Denoising Autoencoder.

Author: Majumdar, Angshul
Subjects: *SIGNAL denoising, *MACHINE learning
Abstract: The term “blind denoising” refers to the fact that the basis used for denoising is learned from the noisy sample itself during denoising. Dictionary learning- and transform learning-based formulations for blind denoising are well known. But there has been no autoencoder-based solution for the said blind denoising approach. So far, autoencoder-based denoising formulations have learned the model on a separate training data and have used the learned model to denoise test samples. Such a methodology fails when the test image (to denoise) is not of the same kind as the models learned with. This will be the first work, where we learn the autoencoder from the noisy sample while denoising. Experimental results show that our proposed method performs better than dictionary learning (K-singular value decomposition), transform learning, sparse stacked denoising autoencoder, and the gold standard BM3D algorithm. [ABSTRACT FROM AUTHOR] more...
Published: 2019
Full Text: View/download PDF

50. A Game-Theoretic Approach to Design Secure and Resilient Distributed Support Vector Machines.

Author: Zhang, Rui and Zhu, Quanyan
Subjects: *SUPPORT vector machines, *NASH equilibrium
Abstract: Distributed support vector machines (DSVMs) have been developed to solve large-scale classification problems in networked systems with a large number of sensors and control units. However, the systems become more vulnerable, as detection and defense are increasingly difficult and expensive. This paper aims to develop secure and resilient DSVM algorithms under adversarial environments in which an attacker can manipulate the training data to achieve his objective. We establish a game-theoretic framework to capture the conflicting interests between an adversary and a set of distributed data processing units. The Nash equilibrium of the game allows predicting the outcome of learning algorithms in adversarial environments and enhancing the resilience of the machine learning through dynamic distributed learning algorithms. We prove that the convergence of the distributed algorithm is guaranteed without assumptions on the training data or network topologies. Numerical experiments are conducted to corroborate the results. We show that the network topology plays an important role in the security of DSVM. Networks with fewer nodes and higher average degrees are more secure. Moreover, a balanced network is found to be less vulnerable to attacks. [ABSTRACT FROM AUTHOR] more...
Published: 2018
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

203 results on '"training data"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources