26 results on '"Dacheng Tao"'
Search Results
2. Multi-target Knowledge Distillation via Student Self-reflection
- Author
-
Jianping Gou, Xiangshuo Xiong, Baosheng Yu, Lan Du, Yibing Zhan, and Dacheng Tao
- Subjects
Artificial Intelligence ,Computer Vision and Pattern Recognition ,Software - Abstract
Knowledge distillation is a simple yet effective technique for deep model compression, which aims to transfer the knowledge learned by a large teacher model to a small student model. To mimic how the teacher teaches the student, existing knowledge distillation methods mainly adapt an unidirectional knowledge transfer, where the knowledge extracted from different intermedicate layers of the teacher model is used to guide the student model. However, it turns out that the students can learn more effectively through multi-stage learning with a self-reflection in the real-world education scenario, which is nevertheless ignored by current knowledge distillation methods. Inspired by this, we devise a new knowledge distillation framework entitled multi-target knowledge distillation via student self-reflection or MTKD-SSR, which can not only enhance the teacher’s ability in unfolding the knowledge to be distilled, but also improve the student’s capacity of digesting the knowledge. Specifically, the proposed framework consists of three target knowledge distillation mechanisms: a stage-wise channel distillation (SCD), a stage-wise response distillation (SRD), and a cross-stage review distillation (CRD), where SCD and SRD transfer feature-based knowledge (i.e., channel features) and response-based knowledge (i.e., logits) at different stages, respectively; and CRD encourages the student model to conduct self-reflective learning after each stage by a self-distillation of the response-based knowledge. Experimental results on five popular visual recognition datasets, CIFAR-100, Market-1501, CUB200-2011, ImageNet, and Pascal VOC, demonstrate that the proposed framework significantly outperforms recent state-of-the-art knowledge distillation methods.
- Published
- 2023
3. Trust-Region Adaptive Frequency for Online Continual Learning
- Author
-
Yajing Kong, Liu Liu, Maoying Qiao, Zhen Wang, and Dacheng Tao
- Subjects
Artificial Intelligence ,Computer Vision and Pattern Recognition ,Software - Abstract
In the paradigm of online continual learning, one neural network is exposed to a sequence of tasks, where the data arrive in an online fashion and previously seen data are not accessible. Such online fashion causes insufficient learning and severe forgetting on past tasks issues, preventing a good stability-plasticity trade-off, where ideally the network is expected to have high plasticity to adapt to new tasks well and have the stability to prevent forgetting on old tasks simultaneously. To solve these issues, we propose a trust-region adaptive frequency approach, which alternates between standard-process and intra-process updates. Specifically, the standard-process replays data stored in a coreset and interleaves the data with current data, and the intra-process updates the network parameters based on the coreset. Furthermore, to improve the unsatisfactory performance stemming from online fashion, the frequency of the intra-process is adjusted based on a trust region, which is measured by the confidence score of current data. During the intra-process, we distill the dark knowledge to retain useful learned knowledge. Moreover, to store more representative data in the coreset, a confidence-based coreset selection is presented in an online manner. The experimental results on standard benchmarks show that the proposed method significantly outperforms state-of-art continual learning algorithms.
- Published
- 2023
4. Region-adaptive Concept Aggregation for Few-shot Visual Recognition
- Author
-
Mengya Han, Yibing Zhan, Baosheng Yu, Yong Luo, Han Hu, Bo Du, Yonggang Wen, Dacheng Tao, and School of Computer Science and Engineering
- Subjects
Concept Learning ,Concept-Aggregation ,Computer science and engineering [Engineering] - Abstract
Few-shot learning (FSL) aims to learn novel concepts from very limited examples. However, most FSL methods suffer from the issue of lacking robustness in concept learning. Specifically, existing FSL methods usually ignore the diversity of region contents that may contain concept-irrelevant information such as the background, which would introduce bias/noise and degrade the performance of conceptual representation learning. To address the above-mentioned issue, we propose a novel metric-based FSL method termed region-adaptive concept aggregation network or RCA-Net. Specifically, we devise a region-adaptive concept aggregator (RCA) to model the relationships of different regions and capture the conceptual information in different regions, which are then integrated in a weighted average manner to obtain the conceptual representation. Consequently, robust concept learning can be achieved by focusing more on the concept-relevant information and less on the conceptual-irrelevant information. We perform extensive experiments on three popular visual recognition benchmarks to demonstrate the superiority of RCA-Net for robust few-shot learning. In particular, on the Caltech-UCSD Birds-200-2011 (CUB200) dataset, the proposed RCA-Net significantly improves 1-shot accuracy from 74.76% to 78.03% and 5-shot accuracy from 86.84% to 89.83% compared with the most competitive counterpart. This work was supported by National Natural Science Foundation of China (No. 62002090), Major Science and Technology Innovation 2030 “New Generation Artificial Intelligence” Key Project (No. 2021ZD0111700) and Special Fund of Hubei Luojia Laboratory, China (No. 220100014).
- Published
- 2023
5. A Unified B-Spline Framework for Scale-Invariant Keypoint Detection
- Author
-
Qi Zheng, Mingming Gong, Xinge You, and Dacheng Tao
- Subjects
Artificial Intelligence ,Computer Vision and Pattern Recognition ,Software - Published
- 2022
6. CODON: On Orchestrating Cross-Domain Attentions for Depth Super-Resolution
- Author
-
Yuxiang Yang, Qi Cao, Jing Zhang, and Dacheng Tao
- Subjects
Artificial Intelligence ,Computer Vision and Pattern Recognition ,Software - Published
- 2022
7. Semantic Edge Detection with Diverse Deep Supervision
- Author
-
Dacheng Tao, Jia-Wang Bian, Yun Liu, Deng-Ping Fan, Le Zhang, and Ming-Ming Cheng
- Subjects
FOS: Computer and information sciences ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,Cognitive neuroscience of visual object recognition ,Pattern recognition ,Object (computer science) ,Semantics ,Convolutional neural network ,Edge detection ,Artificial Intelligence ,Pattern recognition (psychology) ,Segmentation ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software - Abstract
Semantic edge detection (SED), which aims at jointly extracting edges as well as their category information, has far-reaching applications in domains such as semantic segmentation, object proposal generation, and object recognition. SED naturally requires achieving two distinct supervision targets: locating fine detailed edges and identifying high-level semantics. Our motivation comes from the hypothesis that such distinct targets prevent state-of-the-art SED methods from effectively using deep supervision to improve results. To this end, we propose a novel fully convolutional neural network using diverse deep supervision (DDS) within a multi-task framework where bottom layers aim at generating category-agnostic edges, while top layers are responsible for the detection of category-aware semantic edges. To overcome the hypothesized supervision challenge, a novel information converter unit is introduced, whose effectiveness has been extensively evaluated on SBD and Cityscapes datasets., International Journal of Computer Vision
- Published
- 2021
8. 3D-FUTURE: 3D Furniture Shape with TextURE
- Author
-
Stephen J. Maybank, Binqiang Zhao, Rongfei Jia, Dacheng Tao, Mingming Gong, Lin Gao, and Huan Fu
- Subjects
FOS: Computer and information sciences ,Online model ,Computer science ,business.industry ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Object (computer science) ,3D modeling ,Texture (geology) ,Artificial Intelligence ,Pattern recognition (psychology) ,Computer vision ,Polygon mesh ,Segmentation ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Pose ,Software ,ComputingMethodologies_COMPUTERGRAPHICS - Abstract
The 3D CAD shapes in current 3D benchmarks are mostly collected from online model repositories. Thus, they typically have insufficient geometric details and less informative textures, making them less attractive for comprehensive and subtle research in areas such as high-quality 3D mesh and texture recovery. This paper presents 3D Furniture shape with TextURE (3D-FUTURE): a richly-annotated and large-scale repository of 3D furniture shapes in the household scenario. At the time of this technical report, 3D-FUTURE contains 20,240 clean and realistic synthetic images of 5,000 different rooms. There are 9,992 unique detailed 3D instances of furniture with high-resolution textures. Experienced designers developed the room scenes, and the 3D CAD shapes in the scene are used for industrial production. Given the well-organized 3D-FUTURE, we provide baseline experiments on several widely studied tasks, such as joint 2D instance segmentation and 3D object pose estimation, image-based 3D shape retrieval, 3D object reconstruction from a single image, and texture recovery for 3D shapes, to facilitate related future researches on our database., Project Page: https://tianchi.aliyun.com/specials/promotion/alibaba-3d-future
- Published
- 2021
9. Local AdaGrad-type algorithm for stochastic convex-concave optimization
- Author
-
Luofeng Liao, Li Shen, Jia Duan, Mladen Kolar, and Dacheng Tao
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computer Science - Distributed, Parallel, and Cluster Computing ,Optimization and Control (math.OC) ,Artificial Intelligence ,MathematicsofComputing_NUMERICALANALYSIS ,FOS: Mathematics ,Distributed, Parallel, and Cluster Computing (cs.DC) ,Mathematics - Optimization and Control ,Software ,Machine Learning (cs.LG) - Abstract
Large scale convex-concave minimax problems arise in numerous applications, including game theory, robust training, and training of generative adversarial networks. Despite their wide applicability, solving such problems efficiently and effectively is challenging in the presence of large amounts of data using existing stochastic minimax methods. We study a class of stochastic minimax methods and develop a communication-efficient distributed stochastic extragradient algorithm, LocalAdaSEG, with an adaptive learning rate suitable for solving convex-concave minimax problems in the Parameter-Server model. LocalAdaSEG has three main features: (i) a periodic communication strategy that reduces the communication cost between workers and the server; (ii) an adaptive learning rate that is computed locally and allows for tuning-free implementation; and (iii) theoretically, a nearly linear speed-up with respect to the dominant variance term, arising from the estimation of the stochastic gradient, is proven in both the smooth and nonsmooth convex-concave settings. LocalAdaSEG is used to solve a stochastic bilinear game, and train a generative adversarial network. We compare LocalAdaSEG against several existing optimizers for minimax problems and demonstrate its efficacy through several experiments in both homogeneous and heterogeneous settings., Comment: 42 pages; Accepted to Machine Learning, 2022
- Published
- 2022
10. Polysemy Deciphering Network for Robust Human–Object Interaction Detection
- Author
-
Xubin Zhong, Changxing Ding, Xian Qu, and Dacheng Tao
- Subjects
Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,02 engineering and technology ,Computer Vision and Pattern Recognition ,Software - Published
- 2021
11. SCGAN: stacking-based generative adversarial networks for multi-fidelity surrogate modeling
- Author
-
Chao Zhang, Lixue Liu, Hao Wang, Xueguan Song, and Dacheng Tao
- Subjects
Control and Optimization ,Control and Systems Engineering ,Computer Graphics and Computer-Aided Design ,Software ,Computer Science Applications - Published
- 2022
12. Knowledge Distillation: A Survey
- Author
-
Baosheng Yu, Stephen J. Maybank, Jianping Gou, and Dacheng Tao
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Computational complexity theory ,Computer science ,Machine Learning (stat.ML) ,02 engineering and technology ,ENCODE ,Machine Learning (cs.LG) ,law.invention ,Statistics - Machine Learning ,Artificial Intelligence ,law ,0202 electrical engineering, electronic engineering, information engineering ,Architecture ,Distillation ,csis ,business.industry ,Deep learning ,Data science ,Variety (cybernetics) ,Pattern recognition (psychology) ,Scalability ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software - Abstract
In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded., It has been accepted for publication in International Journal of Computer Vision (2021)
- Published
- 2021
13. Recursive Context Routing for Object Detection
- Author
-
Zhe Chen, Jing Zhang, and Dacheng Tao
- Subjects
Context model ,business.industry ,Computer science ,Detector ,Routing algorithm ,02 engineering and technology ,Pascal (programming language) ,Machine learning ,computer.software_genre ,ENCODE ,Object detection ,Artificial Intelligence ,Minimum bounding box ,0202 electrical engineering, electronic engineering, information engineering ,020201 artificial intelligence & image processing ,Segmentation ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,computer ,Software ,computer.programming_language - Abstract
Recent studies have confirmed that modeling contexts is important for object detection. However, current context modeling approaches still have limited expressive capacity and dynamics to encode contextual relationships and model contexts, deteriorating their effectiveness. In this paper, we instead seek to recast the current context modeling framework and perform more dynamic context modeling for object detection. In particular, we devise a novel Recursive Context Routing (ReCoR) mechanism to encode contextual relationships and model contexts more effectively. The ReCoR progressively models more contexts through a recursive structure, providing a more feasible and more comprehensive method to utilize complicated contexts and contextual relationships. For each recursive stage, we further decompose the modeling of contexts and contextual relationships into a spatial modeling process and a channel-wise modeling process, avoiding the need for exhaustive modeling of all the potential pair-wise contextual relationships with more dynamics in a single pass. The spatial modeling process focuses on spatial contexts and gradually involves more spatial contexts according to the recursive architecture. In the channel-wise modeling process, we introduce a context routing algorithm to improve the efficacy of modeling channel-wise contextual relationships dynamically. We perform a comprehensive evaluation of the proposed ReCoR on the popular MS COCO dataset and PASCAL VOC dataset. The effectiveness of the ReCoR can be validated on both datasets according to the consistent performance gains of applying our method on different baseline object detectors. For example, on MS COCO dataset, our approach can respectively deliver around 10% relative improvements for a Mask RCNN detector on the bounding box task, and 7% relative improvements on the instance segmentation task, surpassing existing context modeling approaches with a great margin. State-of-the-art detection performance can also be accessed by applying the ReCoR on the Cascade Mask RCNN detector, illustrating the great benefits of our method for improving context modeling and object detection.
- Published
- 2020
14. Multi-task Compositional Network for Visual Relationship Detection
- Author
-
Jun Yu, Dacheng Tao, Ting Yu, and Yibing Zhan
- Subjects
Feature fusion ,genetic structures ,Computer science ,business.industry ,Pattern recognition ,02 engineering and technology ,Predicate (grammar) ,Object detection ,Artificial Intelligence ,0202 electrical engineering, electronic engineering, information engineering ,Object detector ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software - Abstract
Previous methods treat visual relationship detection as a combination of object detection and predicate detection. However, natural images likely contain hundreds of objects and thousands of object pairs. Relying only on object detection and predicate detection is insufficient for effective visual relationship detection because the significant relationships are easily overwhelmed by the dominant less-significant relationships. In this paper, we propose a novel subtask for visual relationship detection, the significance detection, as the complement of object detection and predicate detection. Significance detection refers to the task of identifying object pairs with significant relationships. Meanwhile, we propose a novel multi-task compositional network (MCN) that simultaneously performs object detection, predicate detection, and significance detection. MCN consists of three modules, an object detector, a relationship generator, and a relationship predictor. The object detector detects objects. The relationship generator provides useful relationships, and the relationship predictor produces significance scores and predicts predicates. Furthermore, MCN proposes a multimodal feature fusion strategy based on visual, spatial, and label features and a novel correlated loss function to deeply combine object detection, predicate detection, and significance detection. MCN is validated on two datasets: visual relationship detection dataset and visual genome dataset. The experimental results compared with state-of-the-art methods verify the competitiveness of MCN and the usefulness of significance detection in visual relationship detection.
- Published
- 2020
15. Semi-online Multi-people Tracking by Re-identification
- Author
-
Long Lan, Dacheng Tao, Xinchao Wang, Gang Hua, and Thomas S. Huang
- Subjects
Markov random field ,business.industry ,Computer science ,Deep learning ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Contrast (statistics) ,Space (commercial competition) ,Tracking (particle physics) ,Task (project management) ,Alpha (programming language) ,Artificial Intelligence ,Pattern recognition (psychology) ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software - Abstract
In this paper, we propose a novel semi-online approach to tracking multiple people. In contrast to conventional offline approaches that take the whole image sequence as input, our semi-online approach tracks people in a frame-by-frame manner by exploring the time, space and multi-camera relationship of detection hypotheses in the near future frames. We cast the multi-people tracking task as a re-identification problem, and explicitly account for objects’ appearance changes and longer-term associations. We model our approach using a Multi-Label Markov Random Field, and introduce a fast $$\alpha $$-expansion algorithm to solve it efficiently. To our best knowledge, this is the first semi-online approach achieved by re-identification. It yields very promising tracking results especially in challenging cases, such as scenarios of the crowded streets where pedestrians frequently occlude each other, scenes captured with moving cameras where objects may disappear and reappear randomly, and videos under changing illuminations wherein the appearances of objects are influenced.
- Published
- 2020
16. Classification with label noise: a Markov chain sampling framework
- Author
-
Lingyang Chu, Zijin Zhao, Dacheng Tao, and Jian Pei
- Subjects
Probabilistic classification ,Stationary distribution ,Correctness ,Markov chain ,Computer Networks and Communications ,business.industry ,Computer science ,Sampling (statistics) ,Pattern recognition ,02 engineering and technology ,State (functional analysis) ,Computer Science Applications ,Set (abstract data type) ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,Artificial Intelligence & Image Processing ,020201 artificial intelligence & image processing ,Artificial intelligence ,Noise (video) ,business ,Information Systems - Abstract
© 2018, The Author(s). The effectiveness of classification methods relies largely on the correctness of instance labels. In real applications, however, the labels of instances are often not highly reliable due to the presence of label noise. Training effective classifiers in the presence of label noise is a challenging task that enjoys many real-world applications. In this paper, we propose a Markov chain sampling (MCS) framework that accurately identifies mislabeled instances and robustly learns effective classifiers. MCS builds a Markov chain where each state uniquely represents a set of randomly sampled instances. We show that the Markov chain has a unique stationary distribution, which puts much larger probability weights on the states dominated by correctly labeled instances than the states dominated by mislabeled instances. We propose a Markov Chain Monte Carlo sampling algorithm to approximate the stationary distribution, which is further used to compute the mislabeling probability for each instance, and train noise-resistant classifiers. The MCS framework is highly compatible with a wide spectrum of classifiers that produce probabilistic classification results. Extensive experiments on both real and synthetic data sets demonstrate the superior effectiveness and efficiency of the proposed MCS framework.
- Published
- 2018
17. Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis
- Author
-
Dacheng Tao, Lihao Nan, James G. Burchfield, Pengyi Yang, Jean Yee Hwa Yang, Thomas A Geddes, and Taiyun Kim
- Subjects
Data Analysis ,Single cells ,Computer science ,Random projection ,lcsh:Computer applications to medicine. Medical informatics ,Biochemistry ,03 medical and health sciences ,Kernel (linear algebra) ,0302 clinical medicine ,Cluster ensemble ,Structural Biology ,scRNA-seq ,Cluster (physics) ,Feature (machine learning) ,Cluster Analysis ,Humans ,RNA-Seq ,Cluster analysis ,lcsh:QH301-705.5 ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Artificial neural network ,Cell type identification ,Sequence Analysis, RNA ,business.industry ,Research ,Applied Mathematics ,Pattern recognition ,Autoencoder ,Computer Science Applications ,lcsh:Biology (General) ,Single-cell transcriptome ,Metric (mathematics) ,lcsh:R858-859.7 ,Neural Networks, Computer ,Artificial intelligence ,Single-Cell Analysis ,Transcriptome ,business ,Algorithms ,030217 neurology & neurosurgery ,Subspace topology - Abstract
BackgroundSingle-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled in an experiment. To this end, clustering has become a key computational technique for grouping cells based on their transcriptome profiles, enabling subsequent cell type identification from each cluster of cells. Due to the high feature-dimensionality of the transcriptome (i.e. the large number of measured genes in each cell) and because only a small fraction of genes are cell type-specific and therefore informative for generating cell type-specific clusters, clustering directly on the original feature/gene dimension may lead to uninformative clusters and hinder correct cell type identification.ResultsHere, we propose an autoencoder-based cluster ensemble framework in which we first take random subspace projections from the data, then compress each random projection to a low-dimensional space using an autoencoder artificial neural network, and finally apply ensemble clustering across all encoded datasets for generating clusters of cells. We employ four evaluation metrics to benchmark clustering performance and our experiments demonstrate that the proposed autoencoder-based cluster ensemble can lead to substantially improved cell type-specific clusters when applied with both the standard k-means clustering algorithm and a state-of-the-art kernel-based clustering algorithm (SIMLR) designed specifically for scRNA-seq data. Compared to directly using these clustering algorithms on the original datasets, the performance improvement in some cases is up to 100%, depending on the evaluation metrics used.ConclusionsOur results suggest that the proposed framework can facilitate more accurate cell type identification as well as other downstream analyses. The code for creating the proposed autoencoder-based cluster ensemble framework is freely available from https://github.com/gedcom/autoencoder_cluster_ensemble
- Published
- 2019
18. A Comprehensive Survey to Face Hallucination
- Author
-
Xuelong Li, Nannan Wang, Jie Li, Xinbo Gao, and Dacheng Tao
- Subjects
Face hallucination ,business.industry ,Computer science ,Speech recognition ,Intelligent decision support system ,Sparse approximation ,Bayesian inference ,Machine learning ,computer.software_genre ,Sketch ,Artificial Intelligence ,Face (geometry) ,Pattern recognition (psychology) ,Computer Vision and Pattern Recognition ,Artificial intelligence ,Face detection ,business ,computer ,Software - Abstract
This paper comprehensively surveys the development of face hallucination (FH), including both face super-resolution and face sketch-photo synthesis techniques. Indeed, these two techniques share the same objective of inferring a target face image (e.g. high-resolution face image, face sketch and face photo) from a corresponding source input (e.g. low-resolution face image, face photo and face sketch). Considering the critical role of image interpretation in modern intelligent systems for authentication, surveillance, law enforcement, security control, and entertainment, FH has attracted growing attention in recent years. Existing FH methods can be grouped into four categories: Bayesian inference approaches, subspace learning approaches, a combination of Bayesian inference and subspace learning approaches, and sparse representation-based approaches. In spite of achieving a certain level of development, FH is limited in its success by complex application conditions such as variant illuminations, poses, or views. This paper provides a holistic understanding and deep insight into FH, and presents a comparative analysis of representative methods and promising future directions.
- Published
- 2013
19. Compressed labeling on distilled labelsets for multi-label learning
- Author
-
Dacheng Tao, Tianyi Zhou, and Xindong Wu
- Subjects
Kullback–Leibler divergence ,Computer science ,business.industry ,Random projection ,Pattern recognition ,Support vector machine ,ComputingMethodologies_PATTERNRECOGNITION ,Automatic image annotation ,Compressed sensing ,Binary classification ,Artificial Intelligence ,Artificial intelligence ,Cluster analysis ,Divergence (statistics) ,business ,Software - Abstract
Directly applying single-label classification methods to the multi-label learning problems substantially limits both the performance and speed due to the imbalance, dependence and high dimensionality of the given label matrix. Existing methods either ignore these three problems or reduce one with the price of aggravating another. In this paper, we propose a {0,1} label matrix compression and recovery method termed "compressed labeling (CL)" to simultaneously solve or at least reduce these three problems. CL first compresses the original label matrix to improve balance and independence by preserving the signs of its Gaussian random projections. Afterward, we directly utilize popular binary classification methods (e.g., support vector machines) for each new label. A fast recovery algorithm is developed to recover the original labels from the predicted new labels. In the recovery algorithm, a "labelset distilling method" is designed to extract distilled labelsets (DLs), i.e., the frequently appeared label subsets from the original labels via recursive clustering and subtraction. Given a distilled and an original label vector, we discover that the signs of their random projections have an explicit joint distribution that can be quickly computed from a geometric inference. Based on this observation, the original label vector is exactly determined after performing a series of Kullback-Leibler divergence based hypothesis tests on the distribution about the new labels. CL significantly improves the balance of the training samples and reduces the dependence between different labels. Moreover, it accelerates the learning process by training fewer binary classifiers for compressed labels, and makes use of label dependence via DLs based tests. Theoretically, we prove the recovery bounds of CL which verifies the effectiveness of CL for label compression and multi-label classification performance improvement brought by label correlations preserved in DLs. We show the effectiveness, efficiency and robustness of CL via 5 groups of experiments on 21 datasets from text classification, image annotation, scene classification, music categorization, genomics and web page classification.
- Published
- 2012
20. Monitoring responses of forest to climate variations by MODIS NDVI: a case study of Hun River upstream, northeastern China
- Author
-
Jinsong Yao, Wei Chen, Xiaoyu Li, Dacheng Tao, and Xingyuan He
- Subjects
Limiting factor ,Lag ,Forestry ,Plant Science ,Vegetation ,Seasonality ,medicine.disease ,Normalized Difference Vegetation Index ,medicine ,Environmental science ,Physical geography ,Moderate-resolution imaging spectroradiometer ,Precipitation ,Mean radiant temperature - Abstract
This study analyzed the temporal variation of Moderate Resolution Imaging Spectroradiometer (MODIS) Normalized Difference Vegetation Index (NDVI) of Hun River upstream forest in northeastern China and its correlation with climate parameters (temperature and precipitation) during the period of 2000–2009. We examined the interannual variation of forest, seasonal variation of forest and lag effects of climate variables (temperature and precipitation) on forest using simple regression and correlation. The objective of this paper was to compare the results of our research and previous researches and to show that the conclusions derived from broad-scale researches provided a direction of policy, but the local details were essential to local management. We found that the annual mean NDVI was significantly correlated with annual mean temperature. The forests studied in our research showed insignificant increase trends except for Fraxinus spp. forest. We concluded that the temperature was the limiting factor of vegetation growth in our study area and the forest which was in the core geographic area of its distribution was resilient to climate variation. When seasonal variation was examined, we found the largest increase trend of seasonal mean NDVI was in winter. The result was different from the outcome of previous research at national scale. There were 3 months lag effects of climate variables on vegetation of our study area in summer and autumn, which was consistent with researches at broad scales. The reasons of both difference and indifference were discussed in this paper. We also got information about tree species for local management using MODIS NDVI. The results of this work suggested that information from local scales would be important complements to researches at broad scales and were essential for local managers.
- Published
- 2011
21. Social image annotation via cross-domain subspace learning
- Author
-
Dacheng Tao, Si Si, Kwok-Ping Chan, and Meng Wang
- Subjects
Training set ,Computer Networks and Communications ,Computer science ,business.industry ,Software Engineering ,Pattern recognition ,Sample (statistics) ,Machine learning ,computer.software_genre ,Domain (software engineering) ,Set (abstract data type) ,Discriminative model ,Hardware and Architecture ,Media Technology ,Key (cryptography) ,Artificial Intelligence & Image Processing ,Artificial intelligence ,business ,computer ,Software ,Subspace topology - Abstract
In recent years, cross-domain learning algorithms have attracted much attention to solve labeled data insufficient problem. However, these cross-domain learning algorithms cannot be applied for subspace learning, which plays a key role in multimedia processing. This paper envisions the cross-domain discriminative subspace learning and provides an effective solution to cross-domain subspace learning. In particular, we propose the cross-domain discriminative locally linear embedding or CDLLE for short. CDLLE connects the training and the testing samples by minimizing the quadratic distance between the distribution of the training samples and that of the testing samples. Therefore, a common subspace for data representation can be preserved. We basically expect the discriminative information to separate the concepts in the training set can be shared to separate the concepts in the testing set as well and thus we have a chance to address above cross-domain problem duly. The margin maximization is duly adopted in CDLLE so the discriminative information for separating different classes can be well preserved. Finally, CDLLE encodes the local geometry of each training samples through a series of linear coefficients which can reconstruct a given sample by its intra-class neighbour samples and thus can locally preserve the intra-class local geometry. Experimental evidence on NUS-WIDE, a popular social image database collected from Flickr, and MSRA-MM, a popular real-world web image annotation database collected from the Internet by using Microsoft Live Search, demonstrates the effectiveness of CDLLE for real-world cross-domain applications. © 2010 Springer Science+Business Media, LLC.
- Published
- 2010
22. Manifold elastic net: a unified framework for sparse dimension reduction
- Author
-
Xindong Wu, Dacheng Tao, and Tianyi Zhou
- Subjects
FOS: Computer and information sciences ,Elastic net regularization ,Current (mathematics) ,Computer Networks and Communications ,Computer science ,Dimensionality reduction ,Nonlinear dimensionality reduction ,Machine Learning (stat.ML) ,Least squares ,Manifold ,Projection (linear algebra) ,Machine Learning (cs.LG) ,Computer Science Applications ,law.invention ,Computer Science - Learning ,Lasso (statistics) ,Statistics - Machine Learning ,law ,Artificial Intelligence & Image Processing ,Manifold (fluid mechanics) ,Algorithm ,Information Systems - Abstract
It is difficult to find the optimal sparse solution of a manifold learning based dimensionality reduction algorithm. The lasso or the elastic net penalized manifold learning based dimensionality reduction is not directly a lasso penalized least square problem and thus the least angle regression (LARS) (Efron et al. \cite{LARS}), one of the most popular algorithms in sparse learning, cannot be applied. Therefore, most current approaches take indirect ways or have strict settings, which can be inconvenient for applications. In this paper, we proposed the manifold elastic net or MEN for short. MEN incorporates the merits of both the manifold learning based dimensionality reduction and the sparse learning based dimensionality reduction. By using a series of equivalent transformations, we show MEN is equivalent to the lasso penalized least square problem and thus LARS is adopted to obtain the optimal sparse solution of MEN. In particular, MEN has the following advantages for subsequent classification: 1) the local geometry of samples is well preserved for low dimensional data representation, 2) both the margin maximization and the classification error minimization are considered for sparse projection calculation, 3) the projection matrix of MEN improves the parsimony in computation, 4) the elastic net penalty reduces the over-fitting problem, and 5) the projection matrix of MEN can be interpreted psychologically and physiologically. Experimental evidence on face recognition over various popular datasets suggests that MEN is superior to top level dimensionality reduction algorithms., 33 pages, 12 figures
- Published
- 2010
23. Local Feature Based Geometric-Resistant Image Information Hiding
- Author
-
Xuelong Li, Cheng Deng, Xinbo Gao, and Dacheng Tao
- Subjects
Computer science ,business.industry ,Cognitive Neuroscience ,Data_MISCELLANEOUS ,Scale-invariant feature transform ,Top-hat transform ,Image processing ,Watermark ,Pattern recognition ,Invariant (physics) ,Computer Science Applications ,Information hiding ,Computer Science::Multimedia ,Computer vision ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Scaling ,Digital watermarking - Abstract
Watermarking aims to hide particular information into some carrier but does not change the visual cognition of the carrier itself. Local features are good candidates to address the watermark synchronization error caused by geometric distortions and have attracted great attention for content-based image watermarking. This paper presents a novel feature point-based image watermarking scheme against geometric distortions. Scale invariant feature transform (SIFT) is first adopted to extract feature points and to generate a disk for each feature point that is invariant to translation and scaling. For each disk, orientation alignment is then performed to achieve rotation invariance. Finally, watermark is embedded in middle-frequency discrete Fourier transform (DFT) coefficients of each disk to improve the robustness against common image processing operations. Extensive experimental results and comparisons with some representative image watermarking methods confirm the excellent performance of the proposed method in robustness against various geometric distortions as well as common image processing operations. © Springer Science+Business Media, LLC 2010.
- Published
- 2010
24. Biologically Inspired Tensor Features
- Author
-
Fionn Murtagh, Dacheng Tao, Yang Mu, and Xuelong Li
- Subjects
Modality (human–computer interaction) ,Computer science ,business.industry ,Cognitive Neuroscience ,Locality ,Nonlinear dimensionality reduction ,Cognitive neuroscience of visual object recognition ,Pattern recognition ,Facial recognition system ,Computer Science Applications ,Discriminative model ,Face (geometry) ,Computer vision ,Computer Vision and Pattern Recognition ,Tensor ,Artificial intelligence ,business - Abstract
According to the research results reported in the past decades, it is well acknowledged that face recognition is not a trivial task. With the development of electronic devices, we are gradually revealing the secret of object recognition in the primate's visual cortex. Therefore, it is time to reconsider face recognition by using biologically inspired features. In this paper, we represent face images by utilizing the C1 units, which correspond to complex cells in the visual cortex, and pool over S1 units by using a maximum operation to reserve only the maximum response of each local area of S1 units. The new representation is termed C1 Face. Because C1 Face is naturally a third-order tensor (or a three dimensional array), we propose three-way discriminative locality alignment (TWDLA), an extension of the discriminative locality alignment, which is a top-level discriminate manifold learning-based subspace learning algorithm. TWDLA has the following advantages: (1) it takes third-order tensors as input directly so the structure information can be well preserved; (2) it models the local geometry over every modality of the input tensors so the spatial relations of input tensors within a class can be preserved; (3) it maximizes the margin between a tensor and tensors from other classes over each modality so it performs well for recognition tasks and (4) it has no under sampling problem. Extensive experiments on YALE and FERET datasets show (1) the proposed C1Face representation can better represent face images than raw pixels and (2) TWDLA can duly preserve both the local geometry and the discriminative information over every modality for recognition. © 2009 Springer Science+Business Media, LLC.
- Published
- 2009
25. Supervised tensor learning
- Author
-
Xuelong Li, Dacheng Tao, Stephen J. Maybank, Weiming Hu, and Xindong Wu
- Subjects
business.industry ,Feature extraction ,Supervised learning ,MathematicsofComputing_NUMERICALANALYSIS ,Pattern recognition ,Linear discriminant analysis ,Machine learning ,computer.software_genre ,Support vector machine ,Human-Computer Interaction ,ComputingMethodologies_PATTERNRECOGNITION ,Binary classification ,Artificial Intelligence ,Hardware and Architecture ,Metric (mathematics) ,Tensor ,Artificial intelligence ,Tensor calculus ,business ,computer ,Software ,Mathematics ,Information Systems - Abstract
Tensor representation is helpful to reduce the small sample size problem in discriminative subspace selection. As pointed by this paper, this is mainly because the structure information of objects in computer vision research is a reasonable constraint to reduce the number of unknown parameters used to represent a learning model. Therefore, we apply this information to the vector-based learning and generalize the vector-based learning to the tensor-based learning as the supervised tensor learning (STL) framework, which accepts tensors as input. To obtain the solution of STL, the alternating projection optimization procedure is developed. The STL framework is a combination of the convex optimization and the operations in multilinear algebra. The tensor representation helps reduce the overfitting problem in vector-based learning. Based on STL and its alternating projection optimization procedure, we generalize support vector machines, minimax probability machine, Fisher discriminant analysis, and distance metric learning, to support tensor machines, tensor minimax probability machine, tensor Fisher discriminant analysis, and the multiple distance metrics learning, respectively. We also study the iterative procedure for feature extraction within STL. To examine the effectiveness of STL, we implement the tensor minimax probability machine for image classification. By comparing with minimax probability machine, the tensor version reduces the overfitting problem. © Springer-Verlag London Limited 2006.
- Published
- 2007
26. Social media mining and knowledge discovery
- Author
-
Guo-Jun Qi, Jinhui Tang, Benoit Huet, and Dacheng Tao
- Subjects
Knowledge management ,Computer Networks and Communications ,business.industry ,Computer science ,Information sharing ,Ranking (information retrieval) ,World Wide Web ,Knowledge extraction ,Social media mining ,Hardware and Architecture ,Media Technology ,Social media ,The Internet ,business ,Image retrieval ,Software ,Similarity learning ,Information Systems - Abstract
applications, it is of high interest to discover potentially important knowledge by social media mining in this nascent field. Recently, more and more research efforts have been dedicated to the aforementioned challenges and opportunities. This special issue includes five papers focusing on different aspects of social media mining and knowledge discovery. With the popularity of social media applications, large amounts of social images associated with user tagging information are available, which can be leveraged to boost image retrieval performance. In “Sparse Semantic Metric Learning for Image Retrieval”, Liu et al. propose a sparse semantic metric learning method by discovering knowledge from these social media resources, and apply the learned metric to search relevant images for users. Different from traditional metric learning approaches that use similar or dissimilar constraints over a homogeneous visual space, the proposed method exploits heterogeneous information from the visual features and the tagging information of images, and formulates the learning problem as a sparse constrained one. Extensive experiments were conducted on a real-world dataset to validate the effectiveness of the proposed approach. In most cases, visual information can be regarded as an enhanced content of the textual document. In “Relative Image Similarity Learning with Contextual Information for Internet Crossmedia Retrieval”, to make image-to-image similarity being more consistent with document-to-document similarity, Jiang et al. propose a method to learn image similarities according to the relations of the accompanied textual documents. More specifically, instead of using the static quantitative relations, rank-based learning procedure by employing structural SVM is adopted, and the ranking structure is established by comparing the relative relations of textual information. The proposed method With the rapid advances of Internet and Web 2.0, social networking and social media become more and more popular in humans’ daily lives. The ubiquitous nature of webenabled devices, including desktops, laptops, tablets, and mobile phones, enables users to participate and interact with each other in various web communities, including photo and video sharing platforms, forums, newsgroups, blogs, micro-blogs, bookmarking services, and locationbased services. The rapidly evolving social networks provide a platform for communication, information sharing, and collaboration among friends, colleagues, alumnus, business partners, and many other social relations. To be accompanied by, increasingly rich and massive heterogeneous media data have been generated by the users, such as images, videos, audios, tweets, tags, categories, titles, geo-locations, comments, and viewer ratings, which offer an unprecedented opportunity for studying novel theories and technologies for social media analysis and mining. While researchers from multidisciplinary areas have proposed intelligent methods for processing social media data and employing such rich multi-modality data for various
- Published
- 2014
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.