9,240 results
Search Results
52. Guest Editors' Introduction to the Special Section on Award-Winning Papers from the IEEE Conference on Computer Vision and Pattern Recognition 2009 (CVPR 2009).
- Author
-
Essa, Irfan, Kang, Sing Bing, and Pollefeys, Marc
- Subjects
- *
COMPUTER vision , *PATTERN recognition systems , *CONFERENCES & conventions - Published
- 2011
- Full Text
- View/download PDF
53. Guest Editors' Introduction to the Special Section on Computational Photography.
- Author
-
Chakrabarti, Ayan, Sunkavalli, Kalyan, and Forsyth, David A.
- Subjects
COMPUTATIONAL photography ,OPTICAL flow ,ARTIFICIAL intelligence ,PIXELS ,COMPUTER graphics - Abstract
An introduction is presented in which the editor discusses articles in the issue on the topics including introduce novel computational methods that exploit non-traditional sensors; propose novel sensors and acquisition systems; and leverage visual measurements made by unconventional sensors.
- Published
- 2020
- Full Text
- View/download PDF
54. MannequinChallenge: Learning the Depths of Moving People by Watching Frozen People.
- Author
-
Li, Zhengqi, Dekel, Tali, Cole, Forrester, Tucker, Richard, Snavely, Noah, Liu, Ce, and Freeman, William T.
- Subjects
PARALLAX ,MONOCULARS ,STREAMING video & television ,CAMERAS ,IMAGE reconstruction - Abstract
We present a method for predicting dense depth in scenarios where both a monocular camera and people in the scene are freely moving (right). Existing methods for recovering depth for dynamic, non-rigid objects from monocular video impose strong assumptions on the objects’ motion and may only recover sparse depth. In this paper, we take a data-driven approach and learn human depth priors from a new source of data: thousands of Internet videos of people imitating mannequins, i.e., freezing in diverse, natural poses, while a hand-held camera tours the scene (left). Because people are stationary, geometric constraints hold, thus training data can be generated using multi-view stereo reconstruction. At inference time, our method uses motion parallax cues from the static areas of the scenes to guide the depth prediction. We evaluate our method on real-world sequences of complex human actions captured by a moving hand-held camera, show improvement over state-of-the-art monocular depth prediction methods, and demonstrate various 3D effects produced using our predicted depth. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
55. Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey.
- Author
-
Jing, Longlong and Tian, Yingli
- Subjects
VISUAL learning ,SUPERVISED learning ,DEEP learning ,COMPUTER vision - Abstract
Large-scale labeled data are generally required to train deep neural networks in order to obtain better performance in visual feature learning from images or videos for computer vision applications. To avoid extensive cost of collecting and annotating large-scale datasets, as a subset of unsupervised learning methods, self-supervised learning methods are proposed to learn general image and video features from large-scale unlabeled data without using any human-annotated labels. This paper provides an extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos. First, the motivation, general pipeline, and terminologies of this field are described. Then the common deep neural network architectures that used for self-supervised learning are summarized. Next, the schema and evaluation metrics of self-supervised learning methods are reviewed followed by the commonly used datasets for images, videos, audios, and 3D data, as well as the existing self-supervised visual feature learning methods. Finally, quantitative performance comparisons of the reviewed methods on benchmark datasets are summarized and discussed for both image and video feature learning. At last, this paper is concluded and lists a set of promising future directions for self-supervised visual feature learning. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
56. Cover.
- Subjects
PERIODICAL publishing - Abstract
These instructions give guidelines for preparing papers for this publication. Presents information for authors publishing in this journal. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
57. TPAMI CVPR Special Section.
- Author
-
Felzenszwalb, Pedro F., Forsyth, David A., Fua, Pascal, and Boult, Terrance E.
- Subjects
CONFERENCES & conventions ,COMPUTER vision ,PATTERN recognition systems - Abstract
The articles in this special issue include papers from the CVPR'11 conference which was held in Colorado Spring, CO, June 2011. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
58. Adversarial Attacks on Time Series.
- Author
-
Karim, Fazle, Majumdar, Somshubra, and Darabi, Houshang
- Subjects
TIME series analysis ,NEAREST neighbor analysis (Statistics) ,DEEP learning - Abstract
Time series classification models have been garnering significant importance in the research community. However, not much research has been done on generating adversarial samples for these models. These adversarial samples can become a security concern. In this paper, we propose utilizing an adversarial transformation network (ATN) on a distilled model to attack various time series classification models. The proposed attack on the classification model utilizes a distilled model as a surrogate that mimics the behavior of the attacked classical time series classification models. Our proposed methodology is applied onto 1-nearest neighbor dynamic time warping (1-NN DTW) and a fully convolutional network (FCN), all of which are trained on 42 University of California Riverside (UCR) datasets. In this paper, we show both models were susceptible to attacks on all 42 datasets. When compared to Fast Gradient Sign Method, the proposed attack generates a larger faction of successful adversarial black-box attacks. A simple defense mechanism is successfully devised to reduce the fraction of successful adversarial samples. Finally, we recommend future researchers that develop time series classification models to incorporating adversarial data samples into their training data sets to improve resilience on adversarial samples. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
59. Auto-Pytorch: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL.
- Author
-
Zimmer, Lucas, Lindauer, Marius, and Hutter, Frank
- Subjects
DEEP learning ,COMPUTER architecture ,MACHINE learning ,TASK analysis - Abstract
While early AutoML frameworks focused on optimizing traditional ML pipelines and their hyperparameters, a recent trend in AutoML is to focus on neural architecture search. In this paper, we introduce Auto-PyTorch, which brings together the best of these two worlds by jointly and robustly optimizing the network architecture and the training hyperparameters to enable fully automated deep learning (AutoDL). Auto-PyTorch achieves state-of-the-art performance on several tabular benchmarks by combining multi-fidelity optimization with portfolio construction for warmstarting and ensembling of deep neural networks (DNNs) and common baselines for tabular data. To thoroughly study our assumptions on how to design such an AutoDL system, we additionally introduce a new benchmark on learning curves for DNNs, dubbed LCBench, and run extensive ablation studies of the full Auto-PyTorch on typical AutoML benchmarks, eventually showing that Auto-PyTorch performs better than several state-of-the-art competitors. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
60. Partially-Connected Neural Architecture Search for Reduced Computational Redundancy.
- Author
-
Xu, Yuhui, Xie, Lingxi, Dai, Wenrui, Zhang, Xiaopeng, Chen, Xin, Qi, Guo-Jun, Xiong, Hongkai, and Tian, Qi
- Subjects
BILEVEL programming ,HIGH speed trains ,ERROR rates ,GRAPHICS processing units ,REDUNDANCY in engineering ,COMPUTER architecture - Abstract
Differentiable architecture search (DARTS) enables effective neural architecture search (NAS) using gradient descent, but suffers from high memory and computational costs. In this paper, we propose a novel approach, namely Partially-Connected DARTS (PC-DARTS), to achieve efficient and stable neural architecture search by reducing the channel and spatial redundancies of the super-network. In the channel level, partial channel connection is presented to randomly sample a small subset of channels for operation selection to accelerate the search process and suppress the over-fitting of the super-network. Side operation is introduced for bypassing (non-sampled) channels to guarantee the performance of searched architectures under extremely low sampling rates. In the spatial level, input features are down-sampled to eliminate spatial redundancy and enhance the efficiency of the mixed computation for operation selection. Furthermore, edge normalization is developed to maintain the consistency of edge selection based on channel sampling with the architectural parameters for edges. Theoretical analysis shows that partial channel connection and parameterized side operation are equivalent to regularizing the super-network on the weights and architectural parameters during bilevel optimization. Experimental results demonstrate that the proposed approach achieves higher search speed and training stability than DARTS. PC-DARTS obtains a top-1 error rate of 2.55 percent on CIFAR-10 with 0.07 GPU-days for architecture search, and a state-of-the-art top-1 error rate of 24.1 percent on ImageNet (under the mobile setting) within 2.8 GPU-days. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
61. Domain Knowledge Alleviates Adversarial Attacks in Multi-Label Classifiers.
- Author
-
Melacci, Stefano, Ciravegna, Gabriele, Sotgiu, Angelo, Demontis, Ambra, Biggio, Battista, Gori, Marco, and Roli, Fabio
- Subjects
MARGINAL distributions ,FIRST-order logic ,EPISTEMIC logic ,DATA distribution ,NAIVE Bayes classification ,SUPERVISED learning ,MACHINE learning - Abstract
Adversarial attacks on machine learning-based classifiers, along with defense mechanisms, have been widely studied in the context of single-label classification problems. In this paper, we shift the attention to multi-label classification, where the availability of domain knowledge on the relationships among the considered classes may offer a natural way to spot incoherent predictions, i.e., predictions associated to adversarial examples lying outside of the training data distribution. We explore this intuition in a framework in which first-order logic knowledge is converted into constraints and injected into a semi-supervised learning problem. Within this setting, the constrained classifier learns to fulfill the domain knowledge over the marginal distribution, and can naturally reject samples with incoherent predictions. Even though our method does not exploit any knowledge of attacks during training, our experimental analysis surprisingly unveils that domain-knowledge constraints can help detect adversarial examples effectively, especially if such constraints are not known to the attacker. We show how to implement an adaptive attack exploiting knowledge of the constraints and, in a specifically-designed setting, we provide experimental comparisons with popular state-of-the-art attacks. We believe that our approach may provide a significant step towards designing more robust multi-label classifiers. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
62. Improved Normalized Cut for Multi-View Clustering.
- Author
-
Zhong, Guo and Pun, Chi-Man
- Subjects
MATRIX decomposition ,LINEAR programming ,DATA structures ,SOFTWARE measurement - Abstract
Spectral clustering (SC) algorithms have been successful in discovering meaningful patterns since they can group arbitrarily shaped data structures. Traditional SC approaches typically consist of two sequential stages, i.e., performing spectral decomposition of an affinity matrix and then rounding the relaxed continuous clustering result into a binary indicator matrix. However, such a two-stage process could make the obtained binary indicator matrix severely deviate from the ground true one. This is because the former step is not devoted to achieving an optimal clustering result. To alleviate this issue, this paper presents a general joint framework to simultaneously learn the optimal continuous and binary indicator matrices for multi-view clustering, which also has the ability to tackle the conventional single-view case. Specially, we provide theoretical proof for the proposed method. Furthermore, an effective alternate updating algorithm is developed to optimize the corresponding complex objective. A number of empirical results on different benchmark datasets demonstrate that the proposed method outperforms several state-of-the-arts in terms of six clustering metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
63. Syntax Customized Video Captioning by Imitating Exemplar Sentences.
- Author
-
Yuan, Yitian, Ma, Lin, and Zhu, Wenwu
- Subjects
SYNTAX (Grammar) ,VIDEO compression ,SEMANTICS ,RECURRENT neural networks - Abstract
Enhancing the diversity of sentences to describe video contents is an important problem arising in recent video captioning research. In this paper, we explore this problem from a novel perspective of customizing video captions by imitating exemplar sentence syntaxes. Specifically, given a video and any syntax-valid exemplar sentence, we introduce a new task of Syntax Customized Video Captioning (SCVC) aiming to generate one caption which not only semantically describes the video contents but also syntactically imitates the given exemplar sentence. To tackle the SCVC task, we propose a novel video captioning model, where a hierarchical sentence syntax encoder is first designed to extract the syntactic structure of the exemplar sentence, then a syntax conditioned caption decoder is devised to generate the syntactically structured caption expressing video semantics. As there is no available syntax customized groundtruth video captions, we tackle such a challenge by proposing a new training strategy, which leverages the traditional pairwise video captioning data and our collected exemplar sentences to accomplish the model learning. Extensive experiments, in terms of semantic, syntactic, fluency, and diversity evaluations, clearly demonstrate our model capability to generate syntax-varied and semantics-coherent video captions that well imitate different exemplar sentences with enriched diversities. Code is available at https://github.com/yytzsy/Syntax-Customized-Video-Captioning. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
64. Orientation Keypoints for 6D Human Pose Estimation.
- Author
-
Fisch, Martin and Clark, Ronald
- Subjects
COMPUTER-generated imagery ,RANGE of motion of joints ,POSE estimation (Computer vision) ,POINT set theory ,ROTATIONAL motion ,COMPUTER vision ,MOTION capture (Human mechanics) - Abstract
Most realtime human pose estimation approaches are based on detecting joint positions. Using the detected joint positions, the yaw and pitch of the limbs can be computed. However, the roll along the limb, which is critical for application such as sports analysis and computer animation, cannot be computed as this axis of rotation remains unobserved. In this paper we therefore introduce orientation keypoints, a novel approach for estimating the full position and rotation of skeletal joints, using only single-frame RGB images. Inspired by how motion-capture systems use a set of point markers to estimate full bone rotations, our method uses virtual markers to generate sufficient information to accurately infer rotations with simple post processing. The rotation predictions improve upon the best reported mean error for joint angles by 48% and achieves 93% accuracy across 15 bone rotations. The method also improves the current state-of-the-art results for joint positions by 14% as measured by MPJPE on the principle dataset, and generalizes well to in-the-wild datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
65. Point Cloud Instance Segmentation With Semi-Supervised Bounding-Box Mining.
- Author
-
Liao, Yongbin, Zhu, Hongyuan, Zhang, Yanggang, Ye, Chuangguan, Chen, Tao, and Fan, Jiayuan
- Subjects
POINT cloud ,SUPERVISED learning ,DEEP learning - Abstract
Point cloud instance segmentation has achieved huge progress with the emergence of deep learning. However, these methods are usually data-hungry with expensive and time-consuming dense point cloud annotations. To alleviate the annotation cost, unlabeled or weakly labeled data is still less explored in the task. In this paper, we introduce the first semi-supervised point cloud instance segmentation framework (SPIB) using both labeled and unlabelled bounding boxes as supervision. To be specific, our SPIB architecture involves a two-stage learning procedure. For stage one, a bounding box proposal generation network is trained under a semi-supervised setting with perturbation consistency regularization (SPCR). The regularization works by enforcing an invariance of the bounding box predictions over different perturbations applied to the input point clouds, to provide self-supervision for network learning. For stage two, the bounding box proposals with SPCR are grouped into some subsets, and the instance masks are mined inside each subset with a novel semantic propagation module and a property consistency graph module. Moreover, we introduce a novel occupancy ratio guided refinement module to refine the instance masks. Extensive experiments on the challenging ScanNet v2 dataset demonstrate our method can achieve competitive performance compared with the recent fully-supervised methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
66. Generalizing Correspondence Analysis for Applications in Machine Learning.
- Author
-
Hsu, Hsiang, Salamatian, Salman, and Calmon, Flavio P.
- Subjects
ARTIFICIAL neural networks ,MACHINE learning ,LEARNING problems - Abstract
Correspondence analysis (CA) is a multivariate statistical tool used to visualize and interpret data dependencies by finding maximally correlated embeddings of pairs of random variables. CA has found applications in fields ranging from epidemiology to social sciences. However, current methods for CA do not scale to large, high-dimensional datasets. In this paper, we provide a novel interpretation of CA in terms of an information-theoretic quantity called the principal inertia components. We show that estimating the principal inertia components, which consists in solving a functional optimization problem over the space of finite variance functions of two random variable, is equivalent to performing CA. We then leverage this insight to design algorithms to perform CA at scale. Specifically, we demonstrate how the principal inertia components can be reliably approximated from data using deep neural networks. Finally, we show how the maximally correlated embeddings of pairs of random variables in CA further play a central role in several learning problems including multi-view and multi-modal learning methods and visualization of classification boundaries. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
67. Unsupervised Grouped Axial Data Modeling via Hierarchical Bayesian Nonparametric Models With Watson Distributions.
- Author
-
Fan, Wentao, Yang, Lin, and Bouguila, Nizar
- Subjects
DATA modeling ,WATSON (Computer) ,INFERENTIAL statistics ,IMAGE analysis ,MATHEMATICAL optimization ,GENE expression ,MACHINE learning - Abstract
This paper aims at proposing an unsupervised hierarchical nonparametric Bayesian framework for modeling axial data (i.e., observations are axes of direction) that can be partitioned into multiple groups, where each observation within a group is sampled from a mixture of Watson distributions with an infinite number of components that are allowed to be shared across different groups. First, we propose a hierarchical nonparametric Bayesian model for modeling grouped axial data based on the hierarchical Pitman-Yor process mixture model of Watson distributions. Then, we demonstrate that by setting the discount parameters of the proposed model to 0, another hierarchical nonparametric Bayesian model based on hierarchical Dirichlet process can be derived for modeling axial data. To learn the proposed models, we systematically develop a closed-form optimization algorithm based on the collapsed variational Bayes (CVB) inference. Furthermore, to ensure the convergence of the proposed learning algorithm, an annealing mechanism is introduced to the framework of CVB inference, leading to an averaged collapsed variational Bayes inference strategy. The merits of the proposed models for modeling grouped axial data are demonstrated through experiments on both synthetic data and real-world applications involving gene expression data clustering and depth image analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
68. AvatarMe ++ : Facial Shape and BRDF Inference With Photorealistic Rendering-Aware GANs.
- Author
-
Lattas, Alexandros, Moschoglou, Stylianos, Ploumpis, Stylianos, Gecer, Baris, Ghosh, Abhijeet, and Zafeiriou, Stefanos
- Subjects
GENERATIVE adversarial networks ,FACE ,TASK analysis - Abstract
Over the last years, with the advent of Generative Adversarial Networks (GANs), many face analysis tasks have accomplished astounding performance, with applications including, but not limited to, face generation and 3D face reconstruction from a single “in-the-wild” image. Nevertheless, to the best of our knowledge, there is no method which can produce render-ready high-resolution 3D faces from “in-the-wild” images and this can be attributed to the: (a) scarcity of available data for training, and (b) lack of robust methodologies that can successfully be applied on very high-resolution data. In this paper, we introduce the first method that is able to reconstruct photorealistic render-ready 3D facial geometry and BRDF from a single “in-the-wild” image. To achieve this, we capture a large dataset of facial shape and reflectance, which we have made public. Moreover, we define a fast and photorealistic differentiable rendering methodology with accurate facial skin diffuse and specular reflection, self-occlusion and subsurface scattering approximation. With this, we train a network that disentangles the facial diffuse and specular reflectance components from a mesh and texture with baked illumination, scanned or reconstructed with a 3DMM fitting method. As we demonstrate in a series of qualitative and quantitative experiments, our method outperforms the existing arts by a significant margin and reconstructs authentic, 4K by 6K-resolution 3D faces from a single low-resolution image, that are ready to be rendered in various applications and bridge the uncanny valley. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
69. Unsupervised Domain Adaptation of Deep Networks for ToF Depth Refinement.
- Author
-
Agresti, Gianluca, Schafer, Henrik, Sartor, Piergiorgio, Incesu, Yalcin, and Zanuttigh, Pietro
- Subjects
NOISE control ,DEEP learning ,MACHINE learning ,FREQUENCY-domain analysis ,NOISE ,MACHINE translating - Abstract
Depth maps acquired with ToF cameras have a limited accuracy due to the high noise level and to the multi-path interference. Deep networks can be used for refining ToF depth, but their training requires real world acquisitions with ground truth, which is complex and expensive to collect. A possible workaround is to train networks on synthetic data, but the domain shift between the real and synthetic data reduces the performances. In this paper, we propose three approaches to perform unsupervised domain adaptation of a depth denoising network from synthetic to real data. These approaches are respectively acting at the input, at the feature and at the output level of the network. The first approach uses domain translation networks to transform labeled synthetic ToF data into a representation closer to real data, that is then used to train the denoiser. The second approach tries to align the network internal features related to synthetic and real data. The third approach uses an adversarial loss, implemented with a discriminator trained to recognize the ground truth statistic, to train the denoiser on unlabeled real data. Experimental results show that the considered approaches are able to outperform other state-of-the-art techniques and achieve superior denoising performances. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
70. Pyramidal Semantic Correspondence Networks.
- Author
-
Jeon, Sangryul, Kim, Seungryong, Min, Dongbo, and Sohn, Kwanghoon
- Subjects
AFFINE transformations ,DEGREES of freedom ,PYRAMIDS ,COMPUTER architecture ,FEATURE extraction - Abstract
This paper presents a deep architecture, called pyramidal semantic correspondence networks (PSCNet), that estimates locally-varying affine transformation fields across semantically similar images. To deal with large appearance and shape variations that commonly exist among different instances within the same object category, we leverage a pyramidal model where the affine transformation fields are progressively estimated in a coarse-to-fine manner so that the smoothness constraint is naturally imposed. Different from the previous methods which directly estimate global or local deformations, our method first starts to estimate the transformation from an entire image and then progressively increases the degree of freedom of the transformation by dividing coarse cell into finer ones. To this end, we propose two spatial pyramid models by dividing an image in a form of quad-tree rectangles or into multiple semantic elements of an object. Additionally, to overcome the limitation of insufficient training data, a novel weakly-supervised training scheme is introduced that generates progressively evolving supervisions through the spatial pyramid models by leveraging a correspondence consistency across image pairs. Extensive experimental results on various benchmarks including TSS, Proposal Flow-WILLOW, Proposal Flow-PASCAL, Caltech-101, and SPair-71k demonstrate that the proposed method outperforms the lastest methods for dense semantic correspondence. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
71. Surface Normals and Shape From Water.
- Author
-
Kuo, Meng-Yu Jennifer, Murai, Satoshi, Kawahara, Ryo, Nobuhara, Shohei, and Nishino, Ko
- Subjects
NEAR infrared radiation ,GEOMETRIC surfaces ,IMAGING systems ,LIGHT absorption ,COMPUTATIONAL photography - Abstract
In this paper, we introduce a novel method for reconstructing surface normals and depth of dynamic objects in water. Past shape recovery methods have leveraged various visual cues for estimating shape (e.g., depth) or surface normals. Methods that estimate both compute one from the other. We show that these two geometric surface properties can be simultaneously recovered for each pixel when the object is observed underwater. Our key idea is to leverage multi-wavelength near-infrared light absorption along different underwater light paths in conjunction with surface shading. Our method can handle both Lambertian and non-Lambertian surfaces. We derive a principled theory for this surface normals and shape from water method and a practical calibration method for determining its imaging parameters values. By construction, the method can be implemented as a one-shot imaging system. We prototype both an off-line and a video-rate imaging system and demonstrate the effectiveness of the method on a number of real-world static and dynamic objects. The results show that the method can recover intricate surface features that are otherwise inaccessible. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
72. Adaptive Temporal Difference Learning With Linear Function Approximation.
- Author
-
Sun, Tao, Shen, Han, Chen, Tianyi, and Li, Dongsheng
- Subjects
MACHINE learning ,TASK analysis ,MARKOV processes ,REINFORCEMENT learning ,APPROXIMATION algorithms - Abstract
This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation tasks in reinforcement learning. Typically, the performance of TD(0) and TD($\lambda$ λ ) is very sensitive to the choice of stepsizes. Oftentimes, TD(0) suffers from slow convergence. Motivated by the tight link between the TD(0) learning algorithm and the stochastic gradient methods, we develop a provably convergent adaptive projected variant of the TD(0) learning algorithm with linear function approximation that we term AdaTD(0). In contrast to the TD(0), AdaTD(0) is robust or less sensitive to the choice of stepsizes. Analytically, we establish that to reach an $\epsilon$ ε accuracy, the number of iterations needed is $\tilde{O}(\epsilon ^{-2}\ln ^4\frac{1}{\epsilon }/\ln ^4\frac{1}{\rho })$ O ˜ (ε - 2 ln 4 1 ε / ln 4 1 ρ) in the general case, where $\rho$ ρ represents the speed of the underlying Markov chain converges to the stationary distribution. This implies that the iteration complexity of AdaTD(0) is no worse than that of TD(0) in the worst case. When the stochastic semi-gradients are sparse, we provide theoretical acceleration of AdaTD(0). Going beyond TD(0), we develop an adaptive variant of TD($\lambda$ λ ), which is referred to as AdaTD($\lambda$ λ ). Empirically, we evaluate the performance of AdaTD(0) and AdaTD($\lambda$ λ ) on several standard reinforcement learning tasks, which demonstrate the effectiveness of our new approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
73. Estimation of Wetness and Color from a Single Multispectral Image.
- Author
-
Okawa, Hiroki, Shimano, Mihoko, Asano, Yuta, Bise, Ryoma, Nishino, Ko, and Sato, Imari
- Subjects
MULTISPECTRAL imaging ,MULTIPLE scattering (Physics) ,THEMATIC mapper satellite ,COMPUTER vision ,AUTONOMOUS vehicles ,IMAGE color analysis ,HUMANOID robots - Abstract
Recognizing wet surfaces and their degrees of wetness is essential for many computer vision applications. Surface wetness can inform us slippery spots on a road to autonomous vehicles, muddy areas of a trail to humanoid robots, and the freshness of groceries to us. The fact that surfaces darken when wet, i.e., monochromatic appearance change, has been modeled to recognize wet surfaces in the past. In this paper, we show that color change, particularly in its spectral behavior, carries rich information about surface wetness. We first derive an analytical spectral appearance model of wet surfaces that expresses the characteristic spectral sharpening due to multiple scattering and absorption in the surface. We present a novel method for estimating key parameters of this spectral appearance model, which enables the recovery of the original surface color and the degree of wetness from a single multispectral image. Applied to a multispectral image, the method estimates the spatial map of wetness together with the dry spectral distribution of the surface. To our knowledge, this is the first work to model and leverage the spectral characteristics of wet surfaces to decipher its appearance. We conduct comprehensive experimental validation with a number of wet real surfaces. The results demonstrate the accuracy of our model and the effectiveness of our method for surface wetness and color estimation. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
74. Convolutional Networks with Dense Connectivity.
- Author
-
Huang, Gao, Liu, Zhuang, Pleiss, Geoff, Maaten, Laurens van der, and Weinberger, Kilian Q.
- Subjects
COMPUTER architecture ,CONVOLUTIONAL neural networks ,OBJECT recognition (Computer vision) ,DEEP learning - Abstract
Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with $L$ L layers have $L$ L connections—one between each layer and its subsequent layer—our network has $\frac{L(L+1)}{2}$ L (L + 1) 2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, encourage feature reuse and substantially improve parameter efficiency. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less parameters and computation to achieve high performance. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
75. Future Frame Prediction Network for Video Anomaly Detection.
- Author
-
Luo, Weixin, Liu, Wen, Lian, Dongze, and Gao, Shenghua
- Subjects
INTRUSION detection systems (Computer security) ,ANOMALY detection (Computer security) ,FORECASTING ,VIDEO surveillance - Abstract
Video Anomaly detection in videos refers to the identification of events that do not conform to expected behavior. However, almost all existing methods cast this problem as the minimization of reconstruction errors of training data including only normal events, which may lead to self-reconstruction and cannot guarantee a larger reconstruction error for an abnormal event. In this paper, we propose to formulate the video anomaly detection problem within a regime of video prediction. We advocate that not all video prediction networks are suitable for video anomaly detection. Then, we introduce two principles for the design of a video prediction network for video anomaly detection. Based on them, we elaborately design a video prediction network with appearance and motion constraints for video anomaly detection. Further, to promote the generalization of the prediction-based video anomaly detection for novel scenes, we carefully investigate the usage of a meta learning within our framework, where our model can be fast adapted to a new testing scene with only a few starting frames. Extensive experiments on both a toy dataset and three real datasets validate the effectiveness of our method in terms of robustness to the uncertainty in normal events and the sensitivity to abnormal events. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
76. Towards a Unified Quadrature Framework for Large-Scale Kernel Machines.
- Author
-
Liu, Fanghui, Huang, Xiaolin, Chen, Yudong, and Suykens, Johan A. K.
- Subjects
NUMERICAL integration ,MONTE Carlo method ,MACHINERY - Abstract
In this paper, we develop a quadrature framework for large-scale kernel machines via a numerical integration representation. Considering that the integration domain and measure of typical kernels, e.g., Gaussian kernels, arc-cosine kernels, are fully symmetric, we leverage a numerical integration technique, deterministic fully symmetric interpolatory rules, to efficiently compute quadrature nodes and associated weights for kernel approximation. Thanks to the full symmetric property, the applied interpolatory rules are able to reduce the number of needed nodes while retaining a high approximation accuracy. Further, we randomize the above deterministic rules by the classical Monte-Carlo sampling and control variates techniques with two merits: 1) The proposed stochastic rules make the dimension of the feature mapping flexibly varying, such that we can control the discrepancy between the original and approximate kernels by tuning the dimnension. 2) Our stochastic rules have nice statistical properties of unbiasedness and variance reduction. In addition, we elucidate the relationship between our deterministic/stochastic interpolatory rules and current typical quadrature based rules for kernel approximation, thereby unifying these methods under our framework. Experimental results on several benchmark datasets show that our methods compare favorably with other representative kernel approximation based methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
77. Hierarchical Bayesian LSTM for Head Trajectory Prediction on Omnidirectional Images.
- Author
-
Yang, Li, Xu, Mai, Guo, Yichen, Deng, Xin, Gao, Fangyuan, and Guan, Zhenyu
- Subjects
BAYESIAN field theory ,GAUSSIAN distribution ,HEAD ,FORECASTING ,MAGNETIC recording heads - Abstract
When viewing omnidirectional images (ODIs), viewers can access different viewports via head movement (HM), which sequentially forms head trajectories in spatial-temporal domain. Thus, head trajectories play a key role in modeling human attention on ODIs. In this paper, we establish a large-scale dataset collecting 21,600 head trajectories on 1,080 ODIs. By mining our dataset, we find two important factors influencing head trajectories, i.e., temporal dependency and subject-specific variance. Accordingly, we propose a novel approach integrating hierarchical Bayesian inference into long short-term memory (LSTM) network for head trajectory prediction on ODIs, which is called HiBayes-LSTM. In HiBayes-LSTM, we develop a mechanism of Future Intention Estimation (FIE), which captures the temporal correlations from previous, current and estimated future information, for predicting viewport transition. Additionally, a training scheme called Hierarchical Bayesian inference (HBI) is developed for modeling inter-subject uncertainty in HiBayes-LSTM. For HBI, we introduce a joint Gaussian distribution in a hierarchy, to approximate the posterior distribution over network weights. By sampling subject-specific weights from the approximated posterior distribution, our HiBayes-LSTM approach can yield diverse viewport transition among different subjects and obtain multiple head trajectories. Extensive experiments validate that our HiBayes-LSTM approach significantly outperforms 9 state-of-the-art approaches for trajectory prediction on ODIs, and then it is successfully applied to predict saliency on ODIs. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
78. Ring and Radius Sampling Based Phasor Field Diffraction Algorithm for Non-Line-of-Sight Reconstruction.
- Author
-
Jiang, Deyang, Liu, Xiaochun, Luo, Jianwen, Liao, Zhengpeng, Velten, Andreas, and Lou, Xin
- Subjects
IMAGE reconstruction algorithms ,ALGORITHMS ,FOURIER transforms ,COMPUTATIONAL complexity ,AVALANCHE diodes ,NONLINEAR optics - Abstract
Non-Line-of-Sight (NLOS) imaging reconstructs occluded scenes based on indirect diffuse reflections. The computational complexity and memory consumption of existing NLOS reconstruction algorithms make them challenging to be implemented in real-time. This paper presents a fast and memory-efficient phasor field-diffraction-based NLOS reconstruction algorithm. In the proposed algorithm, the radial property of the Rayleigh Sommerfeld diffraction (RSD) kernels along with the linear property of Fourier transform are utilized to reconstruct the Fourier domain representations of RSD kernels using a set of kernel bases. Moreover, memory consumption is further reduced by sampling the kernel bases in a radius direction and constructing them during the run-time. According to the analysis, the memory efficiency can be improved by as much as $\mathbf {220}\times$ 220 × . Experimental results show that compared with the original RSD algorithm, the reconstruction time of the proposed algorithm is significantly reduced with little impact on the final imaging quality. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
79. SurRF: Unsupervised Multi-View Stereopsis by Learning Surface Radiance Field.
- Author
-
Zhang, Jinzhi, Ji, Mengqi, Wang, Guangyu, Xue, Zhiwei, Wang, Shengjin, and Fang, Lu
- Subjects
RADIANCE ,POINT cloud ,IMAGE reconstruction ,SURFACE texture ,SURFACE reconstruction - Abstract
The recent success in supervised multi-view stereopsis (MVS) relies on the onerously collected real-world 3D data. While the latest differentiable rendering techniques enable unsupervised MVS, they are restricted to discretized (e.g., point cloud) or implicit geometric representation, suffering from either low integrity for a textureless region or less geometric details for complex scenes. In this paper, we propose SurRF, an unsupervised MVS pipeline by learning Surface Radiance Field, i.e., a radiance field defined on a continuous and explicit 2D surface. Our key insight is that, in a local region, the explicit surface can be gradually deformed from a continuous initialization along view-dependent camera rays by differentiable rendering. That enables us to define the radiance field only on a 2D deformable surface rather than in a dense volume of 3D space, leading to compact representation while maintaining complete shape and realistic texture for large-scale complex scenes. We experimentally demonstrate that the proposed SurRF produces competitive results over the-state-of-the-art on various real-world challenging scenes, without any 3D supervision. Moreover, SurRF shows great potential in owning the joint advantages of mesh (scene manipulation), continuous surface (high geometric resolution), and radiance field (realistic rendering). [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
80. Structure-Preserving Image Super-Resolution.
- Author
-
Ma, Cheng, Rao, Yongming, Lu, Jiwen, and Zhou, Jie
- Subjects
GENERATIVE adversarial networks ,JIGSAW puzzles ,HIGH resolution imaging ,IMAGE reconstruction - Abstract
Structures matter in single image super-resolution (SISR). Benefiting from generative adversarial networks (GANs), recent studies have promoted the development of SISR by recovering photo-realistic images. However, there are still undesired structural distortions in the recovered images. In this paper, we propose a structure-preserving super-resolution (SPSR) method to alleviate the above issue while maintaining the merits of GAN-based methods to generate perceptual-pleasant details. First, we propose SPSR with gradient guidance (SPSR-G) by exploiting gradient maps of images to guide the recovery in two aspects. On the one hand, we restore high-resolution gradient maps by a gradient branch to provide additional structure priors for the SR process. On the other hand, we propose a gradient loss to impose a second-order restriction on the super-resolved images, which helps generative networks concentrate more on geometric structures. Second, since the gradient maps are handcrafted and may only be able to capture limited aspects of structural information, we further extend SPSR-G by introducing a learnable neural structure extractor (NSE) to unearth richer local structures and provide stronger supervision for SR. We propose two self-supervised structure learning methods, contrastive prediction and solving jigsaw puzzles, to train the NSEs. Our methods are model-agnostic, which can be potentially used for off-the-shelf SR networks. Experimental results on five benchmark datasets show that the proposed methods outperform state-of-the-art perceptual-driven SR methods under LPIPS, PSNR, and SSIM metrics. Visual results demonstrate the superiority of our methods in restoring structures while generating natural SR images. Code is available at https://github.com/Maclory/SPSR. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
81. Seek-and-Hide: Adversarial Steganography via Deep Reinforcement Learning.
- Author
-
Pan, Wenwen, Yin, Yanling, Wang, Xinchao, Jing, Yongcheng, and Song, Mingli
- Subjects
CRYPTOGRAPHY ,REINFORCEMENT learning ,CONVOLUTIONAL neural networks - Abstract
The goal of image steganography is to hide a full-sized image, termed secret, into another, termed cover. Prior image steganography algorithms can conceal only one secret within one cover. In this paper, we propose an adaptive local image steganography (AdaSteg) system that allows for scale- and location-adaptive image steganography. By adaptively hiding the secret on a local scale, the proposed system makes the steganography more secured, and further enables multi-secret steganography within one single cover. Specifically, this is achieved via two stages, namely the adaptive patch selection stage and secret encryption stage. Given a pair of secret and cover, first, the optimal local patch for concealment is determined adaptively by exploiting deep reinforcement learning with the proposed steganography quality function and policy network. The secret image is then converted into a patch of encrypted noises, resembling the process of generating adversarial examples, which are further encoded to a local region of the cover to realize a more secured steganography. Furthermore, we propose a novel criterion for the assessment of local steganography, and also collect a challenging dataset that is specialized for the task of image steganography, thus contributing to a standardized benchmark for the area. Experimental results demonstrate that the proposed model yields results superior to the state of the art in both security and capacity. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
82. Mining Data Impressions From Deep Models as Substitute for the Unavailable Training Data.
- Author
-
Nayak, Gaurav Kumar, Mopuri, Konda Reddy, Jain, Saksham, and Chakraborty, Anirban
- Subjects
DATA mining ,OCEAN mining ,COMPUTER vision ,DISTILLATION - Abstract
Pretrained deep models hold their learnt knowledge in the form of model parameters. These parameters act as “memory” for the trained models and help them generalize well on unseen data. However, in absence of training data, the utility of a trained model is merely limited to either inference or better initialization towards a target task. In this paper, we go further and extract synthetic data by leveraging the learnt model parameters. We dub them Data Impressions, which act as proxy to the training data and can be used to realize a variety of tasks. These are useful in scenarios where only the pretrained models are available and the training data is not shared (e.g., due to privacy or sensitivity concerns). We show the applicability of data impressions in solving several computer vision tasks such as unsupervised domain adaptation, continual learning as well as knowledge distillation. We also study the adversarial robustness of lightweight models trained via knowledge distillation using these data impressions. Further, we demonstrate the efficacy of data impressions in generating data-free Universal Adversarial Perturbations (UAPs) with better fooling rates. Extensive experiments performed on benchmark datasets demonstrate competitive performance achieved using data impressions in absence of original training data. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
83. CyCoSeg: A Cyclic Collaborative Framework for Automated Medical Image Segmentation.
- Author
-
Medley, Daniela O., Santiago, Carlos, and Nascimento, Jacinto C.
- Subjects
ARTIFICIAL neural networks ,DIAGNOSTIC imaging ,COMPUTED tomography ,IMAGE segmentation ,COMPUTER vision ,LUNGS ,IMAGE processing - Abstract
Deep neural networks have been tremendously successful at segmenting objects in images. However, it has been shown they still have limitations on challenging problems such as the segmentation of medical images. The main reason behind this lower success resides in the reduced size of the object in the image. In this paper we overcome this limitation through a cyclic collaborative framework, CyCoSeg. The proposed framework is based on a deep active shape model (D-ASM), which provides prior information about the shape of the object, and a semantic segmentation network (SSN). These two models collaborate to reach the desired segmentation by influencing each other: SSN helps D-ASM identify relevant keypoints in the image through an Expectation Maximization formulation, while D-ASM provides a segmentation proposal that guides the SSN. This cycle is repeated until both models converge. Extensive experimental evaluation shows CyCoSeg boosts the performance of the baseline models, including several popular SSNs, while avoiding major architectural modifications. The effectiveness of our method is demonstrated on the left ventricle segmentation on two benchmark datasets, where our approach achieves one of the most competitive results in segmentation accuracy. Furthermore, its generalization is demonstrated for lungs and kidneys segmentation in CT scans. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
84. Improved Variance Reduction Methods for Riemannian Non-Convex Optimization.
- Author
-
Han, Andi and Gao, Junbin
- Subjects
PRINCIPAL components analysis - Abstract
Variance reduction is popular in accelerating gradient descent and stochastic gradient descent for optimization problems defined on both euclidean space and Riemannian manifold. This paper further improves on existing variance reduction methods for non-convex Riemannian optimization, including R-SVRG and R-SRG/R-SPIDER by providing a unified framework for batch size adaptation. Such framework is more general than the existing works by considering retraction and vector transport and mini-batch stochastic gradients. We show that the adaptive-batch variance reduction methods require lower gradient complexities for both general non-convex and gradient dominated functions, under both finite-sum and online optimization settings. Moreover, under the new framework, we complete the analysis of R-SVRG and R-SRG, which is currently missing in the literature. We prove convergence of R-SVRG with much simpler analysis, which leads to curvature-free complexity bounds. We also show improved results for R-SRG under double-loop convergence, which match the optimal complexities as the R-SPIDER. In addition, we prove the first online complexity results for R-SVRG and R-SRG. Lastly, we discuss the potential of adapting batch size for non-smooth, constrained and second-order Riemannian optimizers. Extensive experiments on a variety of applications support the analysis and claims in the paper. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
85. EdgeNets: Edge Varying Graph Neural Networks.
- Author
-
Isufi, Elvin, Gama, Fernando, and Ribeiro, Alejandro
- Subjects
CONVOLUTIONAL neural networks ,EUCLIDEAN domains ,COMPUTATIONAL complexity ,IMAGE encryption - Abstract
Driven by the outstanding performance of neural networks in the structured euclidean domain, recent years have seen a surge of interest in developing neural networks for graphs and data supported on graphs. The graph is leveraged at each layer of the neural network as a parameterization to capture detail at the node level with a reduced number of parameters and computational complexity. Following this rationale, this paper puts forth a general framework that unifies state-of-the-art graph neural networks (GNNs) through the concept of EdgeNet. An EdgeNet is a GNN architecture that allows different nodes to use different parameters to weigh the information of different neighbors. By extrapolating this strategy to more iterations between neighboring nodes, the EdgeNet learns edge- and neighbor-dependent weights to capture local detail. This is a general linear and local operation that a node can perform and encompasses under one formulation all existing graph convolutional neural networks (GCNNs) as well as graph attention networks (GATs). In writing different GNN architectures with a common language, EdgeNets highlight specific architecture advantages and limitations, while providing guidelines to improve their capacity without compromising their local implementation. For instance, we show that GCNNs have a parameter sharing structure that induces permutation equivariance. This can be an advantage or a limitation, depending on the application. In cases where it is a limitation, we propose hybrid approaches and provide insights to develop several other solutions that promote parameter sharing without enforcing permutation equivariance. Another interesting conclusion is the unification of GCNNs and GATs —approaches that have been so far perceived as separate. In particular, we show that GATs are GCNNs on a graph that is learned from the features. This particularization opens the doors to develop alternative attention mechanisms for improving discriminatory power. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
86. An Analysis of Super-Net Heuristics in Weight-Sharing NAS.
- Author
-
Yu, Kaicheng, Ranftl, Rene, and Salzmann, Mathieu
- Subjects
HEURISTIC ,SEARCH algorithms ,NETWORK-attached storage ,COMPUTER architecture ,TASK analysis - Abstract
Weight sharing promises to make neural architecture search (NAS) tractable even on commodity hardware. Existing methods in this space rely on a diverse set of heuristics to design and train the shared-weight backbone network, a.k.a. the super-net. Since heuristics substantially vary across different methods and have not been carefully studied, it is unclear to which extent they impact super-net training and hence the weight-sharing NAS algorithms. In this paper, we disentangle super-net training from the search algorithm, isolate 14 frequently-used training heuristics, and evaluate them over three benchmark search spaces. Our analysis uncovers that several commonly-used heuristics negatively impact the correlation between super-net and stand-alone performance, whereas simple, but often overlooked factors, such as proper hyper-parameter settings, are key to achieve strong performance. Equipped with this knowledge, we show that simple random search achieves competitive performance to complex state-of-the-art NAS algorithms when the super-net is properly trained. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
87. A(DP) $^2$ 2 SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent With Differential Privacy.
- Author
-
Xu, Jie, Zhang, Wei, and Wang, Fei
- Subjects
PRIVACY ,HETEROGENEOUS computing ,DEEP learning ,LEAKS (Disclosure of information) - Abstract
As deep learning models are usually massive and complex, distributed learning is essential for increasing training efficiency. Moreover, in many real-world application scenarios like healthcare, distributed learning can also keep the data local and protect privacy. Recently, the asynchronous decentralized parallel stochastic gradient descent (ADPSGD) algorithm has been proposed and demonstrated to be an efficient and practical strategy where there is no central server, so that each computing node only communicates with its neighbors. Although no raw data will be transmitted across different local nodes, there is still a risk of information leak during the communication process for malicious participants to make attacks. In this paper, we present a differentially private version of asynchronous decentralized parallel SGD framework, or A(DP) $^2$ 2 SGD for short, which maintains communication efficiency of ADPSGD and prevents the inference from malicious participants. Specifically, Rényi differential privacy is used to provide tighter privacy analysis for our composite Gaussian mechanisms while the convergence rate is consistent with the non-private version. Theoretical analysis shows A(DP) $^2$ 2 SGD also converges at the optimal $\mathcal {O}(1/\sqrt{T})$ O (1 / T) rate as SGD. Empirically, A(DP) $^2$ 2 SGD achieves comparable model accuracy as the differentially private version of Synchronous SGD (SSGD) but runs much faster than SSGD in heterogeneous computing environments. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
88. A Simple Spectral Failure Mode for Graph Convolutional Networks.
- Author
-
Priebe, Carey E., Shen, Cencheng, Huang, Ningyuan, and Chen, Tianyi
- Subjects
FAILURE mode & effects analysis ,STATISTICAL learning ,REGULAR graphs ,MACHINE learning ,CONVOLUTIONAL neural networks ,MATHEMATICAL convolutions - Abstract
Neural networks have achieved remarkable successes in machine learning tasks. This has recently been extended to graph learning using neural networks. However, there is limited theoretical work in understanding how and when they perform well, especially relative to established statistical learning techniques such as spectral embedding. In this short paper, we present a simple generative model where unsupervised graph convolutional network fails, while the adjacency spectral embedding succeeds. Specifically, unsupervised graph convolutional network is unable to look beyond the first eigenvector in certain approximately regular graphs, thus missing inference signals in non-leading eigenvectors. The phenomenon is demonstrated by visual illustrations and comprehensive simulations. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
89. Source Data-Absent Unsupervised Domain Adaptation Through Hypothesis Transfer and Labeling Transfer.
- Author
-
Liang, Jian, Hu, Dapeng, Wang, Yunbo, He, Ran, and Feng, Jiashi
- Subjects
SUPERVISED learning ,OBJECT recognition (Computer vision) ,FEATURE extraction ,VISUAL accommodation ,HYPOTHESIS ,KNOWLEDGE transfer ,IMAGE reconstruction - Abstract
Unsupervised domain adaptation (UDA) aims to transfer knowledge from a related but different well-labeled source domain to a new unlabeled target domain. Most existing UDA methods require access to the source data, and thus are not applicable when the data are confidential and not shareable due to privacy concerns. This paper aims to tackle a realistic setting with only a classification model available trained over, instead of accessing to, the source data. To effectively utilize the source model for adaptation, we propose a novel approach called Source HypOthesis Transfer (SHOT), which learns the feature extraction module for the target domain by fitting the target data features to the frozen source classification module (representing classification hypothesis). Specifically, SHOT exploits both information maximization and self-supervised learning for the feature extraction module learning to ensure the target features are implicitly aligned with the features of unseen source data via the same hypothesis. Furthermore, we propose a new labeling transfer strategy, which separates the target data into two splits based on the confidence of predictions (labeling information), and then employ semi-supervised learning to improve the accuracy of less-confident predictions in the target domain. We denote labeling transfer as SHOT++ if the predictions are obtained by SHOT. Extensive experiments on both digit classification and object recognition tasks show that SHOT and SHOT++ achieve results surpassing or comparable to the state-of-the-arts, demonstrating the effectiveness of our approaches for various visual domain adaptation problems. Code will be available at https://github.com/tim-learn/SHOT-plus. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
90. Meta Balanced Network for Fair Face Recognition.
- Author
-
Wang, Mei, Zhang, Yaobin, and Deng, Weihong
- Subjects
FACE perception ,MACHINE learning ,AUTOMATIC differentiation ,HUMAN facial recognition software - Abstract
Although deep face recognition has achieved impressive progress in recent years, controversy has arisen regarding discrimination based on skin tone, questioning their deployment into real-world scenarios. In this paper, we aim to systematically and scientifically study this bias from both data and algorithm aspects. First, using the dermatologist approved Fitzpatrick Skin Type classification system and Individual Typology Angle, we contribute a benchmark called Identity Shades (IDS) database, which effectively quantifies the degree of the bias with respect to skin tone in existing face recognition algorithms and commercial APIs. Further, we provide two skin-tone aware training datasets, called BUPT-Globalface dataset and BUPT-Balancedface dataset, to remove bias in training data. Finally, to mitigate the algorithmic bias, we propose a novel meta-learning algorithm, called Meta Balanced Network (MBN), which learns adaptive margins in large margin loss such that the model optimized by this loss can perform fairly across people with different skin tones. To determine the margins, our method optimizes a meta skewness loss on a clean and unbiased meta set and utilizes backward-on-backward automatic differentiation to perform a second order gradient descent step on the current margins. Extensive experiments show that MBN successfully mitigates bias and learns more balanced performance for people with different skin tones in face recognition. The proposed datasets are available at http://www.whdeng.cn/RFW/index.html. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
91. Distilling Knowledge by Mimicking Features.
- Author
-
Wang, Guo-Hua, Ge, Yifan, and Wu, Jianxin
- Subjects
OBJECT recognition (Computer vision) ,FEATURE selection ,TEACHER training - Abstract
Knowledge distillation (KD) is a popular method to train efficient networks (“student”) with the help of high-capacity networks (“teacher”). Traditional methods use the teacher’s soft logits as extra supervision to train the student network. In this paper, we argue that it is more advantageous to make the student mimic the teacher’s features in the penultimate layer. Not only the student can directly learn more effective information from the teacher feature, feature mimicking can also be applied for teachers trained without a softmax layer. Experiments show that it can achieve higher accuracy than traditional KD. To further facilitate feature mimicking, we decompose a feature vector into the magnitude and the direction. We argue that the teacher should give more freedom to the student feature’s magnitude, and let the student pay more attention on mimicking the feature direction. To meet this requirement, we propose a loss term based on locality-sensitive hashing (LSH). With the help of this new loss, our method indeed mimics feature directions more accurately, relaxes constraints on feature magnitudes, and achieves state-of-the-art distillation accuracy. We provide theoretical analyses of how LSH facilitates feature direction mimicking, and further extend feature mimicking to multi-label recognition and object detection. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
92. Counting People by Estimating People Flows.
- Author
-
Liu, Weizhe, Salzmann, Mathieu, and Fua, Pascal
- Subjects
OPTICAL flow ,ACTIVE learning ,COUNTING ,DEEP learning ,VIDEO compression - Abstract
Modern methods for counting people in crowded scenes rely on deep networks to estimate people densities in individual images. As such, only very few take advantage of temporal consistency in video sequences, and those that do only impose weak smoothness constraints across consecutive frames. In this paper, we advocate estimating people flows across image locations between consecutive images and inferring the people densities from these flows instead of directly regressing them. This enables us to impose much stronger constraints encoding the conservation of the number of people. As a result, it significantly boosts performance without requiring a more complex architecture. Furthermore, it allows us to exploit the correlation between people flow and optical flow to further improve the results. We also show that leveraging people conservation constraints in both a spatial and temporal manner makes it possible to train a deep crowd counting model in an active learning setting with much fewer annotations. This significantly reduces the annotation cost while still leading to similar performance to the full supervision case. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
93. Learning Semantic Segmentation of Large-Scale Point Clouds With Random Sampling.
- Author
-
Hu, Qingyong, Yang, Bo, Xie, Linhai, Rosa, Stefano, Guo, Yulan, Wang, Zhihua, Trigoni, Niki, and Markham, Andrew
- Subjects
POINT cloud ,SAMPLING (Process) ,STATISTICAL sampling ,SEMANTICS - Abstract
We study the problem of efficient semantic segmentation of large-scale 3D point clouds. By relying on expensive sampling techniques or computationally heavy pre/post-processing steps, most existing approaches are only able to be trained and operate over small-scale point clouds. In this paper, we introduce RandLA-Net, an efficient and lightweight neural architecture to directly infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Although remarkably computation and memory efficient, random sampling can discard key features by chance. To overcome this, we introduce a novel local feature aggregation module to progressively increase the receptive field for each 3D point, thereby effectively preserving geometric details. Comparative experiments show that our RandLA-Net can process 1 million points in a single pass up to 200× faster than existing approaches. Moreover, extensive experiments on five large-scale point cloud datasets, including Semantic3D, SemanticKITTI, Toronto3D, NPM3D and S3DIS, demonstrate the state-of-the-art semantic segmentation performance of our RandLA-Net. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
94. Recurrent Multi-Frame Deraining: Combining Physics Guidance and Adversarial Learning.
- Author
-
Yang, Wenhan, Tan, Robby T., Feng, Jiashi, Wang, Shiqi, Cheng, Bin, and Liu, Jiaying
- Subjects
IMAGE color analysis ,PHYSICS - Abstract
Existing video rain removal methods mainly focus on rain streak removal and are solely trained based on the synthetic data, which neglect more complex degradation factors, e.g., rain accumulation, and the prior knowledge in real rain data. Thus, in this paper, we build a more comprehensive rain model with several degradation factors and construct a novel two-stage video rain removal method that combines the power of synthetic videos and real data. Specifically, a novel two-stage progressive network is proposed: recovery guided by a physics model, and further restoration by adversarial learning. The first stage performs an inverse recovery process guided by our proposed rain model. An initially estimated background frame is obtained based on the input rain frame. The second stage employs adversarial learning to refine the result, i.e., recovering the overall color and illumination distributions of the frame, the background details that are failed to be recovered in the first stage, and removing the artifacts generated in the first stage. Furthermore, we also introduce a more comprehensive rain model that includes degradation factors, e.g., occlusion and rain accumulation, which appear in real scenes yet ignored by existing methods. This model, which generates more realistic rain images, will train and evaluate our models better. Extensive evaluations on synthetic and real videos show the effectiveness of our method in comparisons to the state-of-the-art methods. Our datasets, results and code are available at: https://github.com/flyywh/Recurrent-Multi-Frame-Deraining. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
95. $K$ K -Shot Contrastive Learning of Visual Features With Multiple Instance Augmentations.
- Author
-
Xu, Haohang, Xiong, Hongkai, and Qi, Guo-Jun
- Subjects
VISUAL learning ,EIGENVALUES - Abstract
In this paper, we propose the $K$ K -Shot Contrastive Learning (KSCL) of visual features by applying multiple augmentations to investigate the sample variations within individual instances. It aims to combine the advantages of inter-instance discrimination by learning discriminative features to distinguish between different instances, as well as intra-instance variations by matching queries against the variants of augmented samples over instances. Particularly, for each instance, it constructs an instance subspace to model the configuration of how the significant factors of variations in $K$ K -shot augmentations can be combined to form the variants of augmentations. Given a query, the most relevant variant of instances is then retrieved by projecting the query onto their subspaces to predict the positive instance class. This generalizes the existing contrastive learning that can be viewed as a special one-shot case. An eigenvalue decomposition is performed to configure instance subspaces, and the embedding network can be trained end-to-end through the differentiable subspace configuration. Experiment results demonstrate the proposed $K$ K -shot contrastive learning achieves superior performances to the state-of-the-art unsupervised methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
96. Deep Non-Negative Matrix Factorization Architecture Based on Underlying Basis Images Learning.
- Author
-
Zhao, Yang, Wang, Huiyang, and Pei, Jihong
- Subjects
MATRIX decomposition ,NONNEGATIVE matrices ,SPARSE matrices ,PATTERN recognition systems ,LINEAR programming - Abstract
The non-negative matrix factorization (NMF) algorithm represents the original image as a linear combination of a set of basis images. This image representation method is in line with the idea of “parts constitute a whole” in human thinking. The existing deep NMF performs deep factorization on the coefficient matrix. In these methods, the basis images used to represent the original image is essentially obtained by factorizing the original images once. To extract features reflecting the deep localization characteristics of images, a novel deep NMF architecture based on underlying basis images learning is proposed for the first time. The architecture learns the underlying basis images by deep factorization on the basis images matrix. The deep factorization architecture proposed in this paper has strong interpretability. To implement this architecture, this paper proposes a deep non-negative basis matrix factorization algorithm to obtain the underlying basis images. Then, the objective function is established with an added regularization term, which directly constrains the basis images matrix to obtain the basis images with good local characteristics, and a regularized deep non-negative basis matrix factorization algorithm is proposed. The regularized deep nonlinear non-negative basis matrix factorization algorithm is also proposed to handle pattern recognition tasks with complex data. This paper also theoretically proves the convergence of the algorithm. Finally, the experimental results show that the deep NMF architecture based on the underlying basis images learning proposed in this paper can obtain better recognition performance than the other state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
97. Single Day Outdoor Photometric Stereo.
- Author
-
Hold-Geoffroy, Yannick, Gotardo, Paulo, and Lalonde, Jean-Francois
- Subjects
PHOTOMETRIC stereo ,EXTERIOR lighting ,DAYLIGHT ,GEOMETRIC surfaces ,SURFACE geometry - Abstract
Photometric Stereo (PS) under outdoor illumination remains a challenging, ill-posed problem due to insufficient variability in illumination. Months-long capture sessions are typically used in this setup, with little success on shorter, single-day time intervals. In this paper, we investigate the solution of outdoor PS over a single day, under different weather conditions. First, we investigate the relationship between weather and surface reconstructability in order to understand when natural lighting allows existing PS algorithms to work. Our analysis reveals that partially cloudy days improve the conditioning of the outdoor PS problem while sunny days do not allow the unambiguous recovery of surface normals from photometric cues alone. We demonstrate that calibrated PS algorithms can thus be employed to reconstruct Lambertian surfaces accurately under partially cloudy days. Second, we solve the ambiguity arising in clear days by combining photometric cues with prior knowledge on material properties, local surface geometry and the natural variations in outdoor lighting through a CNN-based, weakly-calibrated PS technique. Given a sequence of outdoor images captured during a single sunny day, our method robustly estimates the scene surface normals with unprecedented quality for the considered scenario. Our approach does not require precise geolocation and significantly outperforms several state-of-the-art methods on images with real lighting, showing that our CNN can combine efficiently learned priors and photometric cues available during a single sunny day. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
98. IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors.
- Subjects
ARTIFICIAL intelligence ,DIGITAL Object Identifiers - Abstract
The article discusses about the aim and scope of the periodical, along with mentions guidelines for submission of manuscripts and discusses the copyright information of the periodical.
- Published
- 2022
- Full Text
- View/download PDF
99. Cover 3.
- Subjects
ARTIFICIAL intelligence ,DIGITAL Object Identifiers - Published
- 2022
- Full Text
- View/download PDF
100. Continuous Action Reinforcement Learning From a Mixture of Interpretable Experts.
- Author
-
Akrour, Riad, Tateo, Davide, and Peters, Jan
- Subjects
ACTIVE learning ,NONLINEAR functions ,MACHINE learning ,REINFORCEMENT learning ,APPROXIMATION algorithms - Abstract
Reinforcement learning (RL) has demonstrated its ability to solve high dimensional tasks by leveraging non-linear function approximators. However, these successes are mostly achieved by ’black-box’ policies in simulated domains. When deploying RL to the real world, several concerns regarding the use of a ’black-box’ policy might be raised. In order to make the learned policies more transparent, we propose in this paper a policy iteration scheme that retains a complex function approximator for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure, based on a mixture of interpretable experts. Each expert selects a primitive action according to a distance to a prototypical state. A key design decision to keep such experts interpretable is to select the prototypical states from trajectory data. The main technical contribution of the paper is to address the challenges introduced by this non-differentiable prototypical state selection procedure. Experimentally, we show that our proposed algorithm can learn compelling policies on continuous action deep RL benchmarks, matching the performance of neural network based policies, but returning policies that are more amenable to human inspection than neural network or linear-in-feature policies. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.