468 results
Search Results
2. Geometry-Guided Street-View Panorama Synthesis From Satellite Imagery.
- Author
-
Shi, Yujiao, Campbell, Dylan, Yu, Xin, and Li, Hongdong
- Subjects
- *
REMOTE-sensing images , *PIXELS , *GENERATIVE adversarial networks , *PANORAMAS , *LANDSAT satellites , *GEOSTATIONARY satellites - Abstract
This paper presents a new approach for synthesizing a novel street-view panorama given a satellite image, as if captured from the geographical location at the center of the satellite image. Existing works approach this as an image generation problem, adopting generative adversarial networks to implicitly learn the cross-view transformations, but ignore the geometric constraints. In this paper, we make the geometric correspondences between the satellite and street-view images explicit so as to facilitate the transfer of information between domains. Specifically, we observe that when a 3D point is visible in both views, and the height of the point relative to the camera is known, there is a deterministic mapping between the projected points in the images. Motivated by this, we develop a novel satellite to street-view projection (S2SP) module which learns the height map and projects the satellite image to the ground-level viewpoint, explicitly connecting corresponding pixels. With these projected satellite images as input, we next employ a generator to synthesize realistic street-view panoramas that are geometrically consistent with the satellite images. Our S2SP module is differentiable and the whole framework is trained in an end-to-end manner. Extensive experimental results on two cross-view benchmark datasets demonstrate that our method generates more accurate and consistent images than existing approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
3. Investigating Bi-Level Optimization for Learning and Vision From a Unified Perspective: A Survey and Beyond.
- Author
-
Liu, Risheng, Gao, Jiaxin, Zhang, Jin, Meng, Deyu, and Lin, Zhouchen
- Subjects
- *
BILEVEL programming , *COMPUTER vision , *REINFORCEMENT learning , *AUTOMATIC differentiation , *VISUAL fields , *DEEP learning , *MACHINE learning - Abstract
Bi-Level Optimization (BLO) is originated from the area of economic game theory and then introduced into the optimization community. BLO is able to handle problems with a hierarchical structure, involving two levels of optimization tasks, where one task is nested inside the other. In machine learning and computer vision fields, despite the different motivations and mechanisms, a lot of complex problems, such as hyper-parameter optimization, multi-task and meta learning, neural architecture search, adversarial learning and deep reinforcement learning, actually all contain a series of closely related subproblms. In this paper, we first uniformly express these complex learning and vision problems from the perspective of BLO. Then we construct a best-response-based single-level reformulation and establish a unified algorithmic framework to understand and formulate mainstream gradient-based BLO methodologies, covering aspects ranging from fundamental automatic differentiation schemes to various accelerations, simplifications, extensions and their convergence and complexity properties. Last but not least, we discuss the potentials of our unified BLO framework for designing new algorithms and point out some promising directions for future research. A list of important papers discussed in this survey, corresponding codes, and additional resources on BLOs are publicly available at: https://github.com/vis-opt-group/BLO. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
4. Reversible Data Hiding By Using CNN Prediction and Adaptive Embedding.
- Author
-
Hu, Runwen and Xiang, Shijun
- Subjects
- *
REVERSIBLE data hiding (Computer science) , *GLOBAL optimization , *FORECASTING - Abstract
In the field of reversible data hiding (RDH), how to predict an image and embed a message into the image with smaller distortion are two important aspects. In this paper, we propose a novel and efficient RDH method by innovating an intelligent predictor and an adaptive embedding way. In the prediction stage, we first constructed a convolutional neural network (CNN) based predictor by reasonably dividing an image into four parts. In such a way, each part can be predicted by using the other three parts as the context for the improvement of the prediction performance. Compared with existing predictors, the proposed CNN predictor can use more neighboring pixels for the prediction by exploiting its multi-receptive fields and global optimization capacities. In the embedding stage, we also developed a prediction-error-ordering (PEO) based adaptive embedding strategy, which can better adapt image content and thus efficiently reduce the embedding distortion by elaborately and luminously applying background complexity to select and pair those smaller prediction errors for data hiding. With the proposed CNN prediction and embedding ways, the RDH method presented in this paper provides satisfactory results in improving the visual quality of data hidden images, e.g., the average PSNR value for the Kodak benchmark dataset can reach as high as 63.59 dB with an embedding capacity of 10,000 bits. Extensive experimental results have shown that the RDH method proposed in this paper is superior to those existing state-of-the-art works. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
5. Pharmacological, Non-Pharmacological Policies and Mutation: An Artificial Intelligence Based Multi-Dimensional Policy Making Algorithm for Controlling the Casualties of the Pandemic Diseases.
- Author
-
Tutsoy, Onder
- Subjects
- *
ARTIFICIAL intelligence , *PANDEMICS , *PARAMETRIC modeling , *ALGORITHMS , *VACCINATION policies , *MULTIDIMENSIONAL databases - Abstract
Fighting against the pandemic diseases with unique characters requires new sophisticated approaches like the artificial intelligence. This paper develops an artificial intelligence algorithm to produce multi-dimensional policies for controlling and minimizing the pandemic casualties under the limited pharmacological resources. In this respect, a comprehensive parametric model with a priority and age-specific vaccination policy and a variety of non-pharmacological policies are introduced. This parametric model is utilized for constructing an artificial intelligence algorithm by following the exact analogy of the model-based solution. Also, this parametric model is manipulated by the artificial intelligence algorithm to seek for the best multi-dimensional non-pharmacological policies that minimize the future pandemic casualties as desired. The role of the pharmacological and non-pharmacological policies on the uncertain future casualties are extensively addressed on the real data. It is shown that the developed artificial intelligence algorithm is able to produce efficient policies which satisfy the particular optimization targets such as focusing on minimization of the death casualties more than the infected casualties or considering the curfews on the people age over 65 rather than the other non-pharmacological policies. The paper finally analyses a variety of the mutant virus cases and the corresponding non-pharmacological policies aiming to reduce the morbidity and mortality rates. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
6. Privacy Preserving Defense For Black Box Classifiers Against On-Line Adversarial Attacks.
- Author
-
Theagarajan, Rajkumar and Bhanu, Bir
- Subjects
- *
DEEP learning , *PRIVACY , *IMAGE recognition (Computer vision) - Abstract
Deep learning models have been shown to be vulnerable to adversarial attacks. Adversarial attacks are imperceptible perturbations added to an image such that the deep learning model misclassifies the image with a high confidence. Existing adversarial defenses validate their performance using only the classification accuracy. However, classification accuracy by itself is not a reliable metric to determine if the resulting image is “adversarial-free”. This is a foundational problem for online image recognition applications where the ground-truth of the incoming image is not known and hence we cannot compute the accuracy of the classifier or validate if the image is “adversarial-free” or not. This paper proposes a novel privacy preserving framework for defending Black box classifiers from adversarial attacks using an ensemble of iterative adversarial image purifiers whose performance is continuously validated in a loop using Bayesian uncertainties. The proposed approach can convert a single-step black box adversarial defense into an iterative defense and proposes three novel privacy preserving Knowledge Distillation (KD) approaches that use prior meta-information from various datasets to mimic the performance of the Black box classifier. Additionally, this paper proves the existence of an optimal distribution for the purified images that can reach a theoretical lower bound, beyond which the image can no longer be purified. Experimental results on six public benchmark datasets namely: 1) Fashion-MNIST, 2) CIFAR-10, 3) GTSRB, 4) MIO-TCD, 5) Tiny-ImageNet, and 6) MS-Celeb show that the proposed approach can consistently detect adversarial examples and purify or reject them against a variety of adversarial attacks. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
7. Geometry-Guided Street-View Panorama Synthesis From Satellite Imagery.
- Author
-
Shi, Yujiao, Campbell, Dylan, Yu, Xin, and Li, Hongdong
- Subjects
- *
REMOTE-sensing images , *PIXELS , *GENERATIVE adversarial networks , *PANORAMAS , *LANDSAT satellites , *GEOSTATIONARY satellites - Abstract
This paper presents a new approach for synthesizing a novel street-view panorama given a satellite image, as if captured from the geographical location at the center of the satellite image. Existing works approach this as an image generation problem, adopting generative adversarial networks to implicitly learn the cross-view transformations, but ignore the geometric constraints. In this paper, we make the geometric correspondences between the satellite and street-view images explicit so as to facilitate the transfer of information between domains. Specifically, we observe that when a 3D point is visible in both views, and the height of the point relative to the camera is known, there is a deterministic mapping between the projected points in the images. Motivated by this, we develop a novel satellite to street-view projection (S2SP) module which learns the height map and projects the satellite image to the ground-level viewpoint, explicitly connecting corresponding pixels. With these projected satellite images as input, we next employ a generator to synthesize realistic street-view panoramas that are geometrically consistent with the satellite images. Our S2SP module is differentiable and the whole framework is trained in an end-to-end manner. Extensive experimental results on two cross-view benchmark datasets demonstrate that our method generates more accurate and consistent images than existing approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
8. Investigating Bi-Level Optimization for Learning and Vision From a Unified Perspective: A Survey and Beyond.
- Author
-
Liu, Risheng, Gao, Jiaxin, Zhang, Jin, Meng, Deyu, and Lin, Zhouchen
- Subjects
- *
BILEVEL programming , *COMPUTER vision , *REINFORCEMENT learning , *AUTOMATIC differentiation , *VISUAL fields , *DEEP learning , *MACHINE learning - Abstract
Bi-Level Optimization (BLO) is originated from the area of economic game theory and then introduced into the optimization community. BLO is able to handle problems with a hierarchical structure, involving two levels of optimization tasks, where one task is nested inside the other. In machine learning and computer vision fields, despite the different motivations and mechanisms, a lot of complex problems, such as hyper-parameter optimization, multi-task and meta learning, neural architecture search, adversarial learning and deep reinforcement learning, actually all contain a series of closely related subproblms. In this paper, we first uniformly express these complex learning and vision problems from the perspective of BLO. Then we construct a best-response-based single-level reformulation and establish a unified algorithmic framework to understand and formulate mainstream gradient-based BLO methodologies, covering aspects ranging from fundamental automatic differentiation schemes to various accelerations, simplifications, extensions and their convergence and complexity properties. Last but not least, we discuss the potentials of our unified BLO framework for designing new algorithms and point out some promising directions for future research. A list of important papers discussed in this survey, corresponding codes, and additional resources on BLOs are publicly available at: https://github.com/vis-opt-group/BLO. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
9. Reversible Data Hiding By Using CNN Prediction and Adaptive Embedding.
- Author
-
Hu, Runwen and Xiang, Shijun
- Subjects
- *
REVERSIBLE data hiding (Computer science) , *GLOBAL optimization , *FORECASTING - Abstract
In the field of reversible data hiding (RDH), how to predict an image and embed a message into the image with smaller distortion are two important aspects. In this paper, we propose a novel and efficient RDH method by innovating an intelligent predictor and an adaptive embedding way. In the prediction stage, we first constructed a convolutional neural network (CNN) based predictor by reasonably dividing an image into four parts. In such a way, each part can be predicted by using the other three parts as the context for the improvement of the prediction performance. Compared with existing predictors, the proposed CNN predictor can use more neighboring pixels for the prediction by exploiting its multi-receptive fields and global optimization capacities. In the embedding stage, we also developed a prediction-error-ordering (PEO) based adaptive embedding strategy, which can better adapt image content and thus efficiently reduce the embedding distortion by elaborately and luminously applying background complexity to select and pair those smaller prediction errors for data hiding. With the proposed CNN prediction and embedding ways, the RDH method presented in this paper provides satisfactory results in improving the visual quality of data hidden images, e.g., the average PSNR value for the Kodak benchmark dataset can reach as high as 63.59 dB with an embedding capacity of 10,000 bits. Extensive experimental results have shown that the RDH method proposed in this paper is superior to those existing state-of-the-art works. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
10. Pharmacological, Non-Pharmacological Policies and Mutation: An Artificial Intelligence Based Multi-Dimensional Policy Making Algorithm for Controlling the Casualties of the Pandemic Diseases.
- Author
-
Tutsoy, Onder
- Subjects
- *
ARTIFICIAL intelligence , *PANDEMICS , *PARAMETRIC modeling , *ALGORITHMS , *VACCINATION policies , *MULTIDIMENSIONAL databases - Abstract
Fighting against the pandemic diseases with unique characters requires new sophisticated approaches like the artificial intelligence. This paper develops an artificial intelligence algorithm to produce multi-dimensional policies for controlling and minimizing the pandemic casualties under the limited pharmacological resources. In this respect, a comprehensive parametric model with a priority and age-specific vaccination policy and a variety of non-pharmacological policies are introduced. This parametric model is utilized for constructing an artificial intelligence algorithm by following the exact analogy of the model-based solution. Also, this parametric model is manipulated by the artificial intelligence algorithm to seek for the best multi-dimensional non-pharmacological policies that minimize the future pandemic casualties as desired. The role of the pharmacological and non-pharmacological policies on the uncertain future casualties are extensively addressed on the real data. It is shown that the developed artificial intelligence algorithm is able to produce efficient policies which satisfy the particular optimization targets such as focusing on minimization of the death casualties more than the infected casualties or considering the curfews on the people age over 65 rather than the other non-pharmacological policies. The paper finally analyses a variety of the mutant virus cases and the corresponding non-pharmacological policies aiming to reduce the morbidity and mortality rates. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
11. Privacy Preserving Defense For Black Box Classifiers Against On-Line Adversarial Attacks.
- Author
-
Theagarajan, Rajkumar and Bhanu, Bir
- Subjects
- *
DEEP learning , *PRIVACY , *IMAGE recognition (Computer vision) - Abstract
Deep learning models have been shown to be vulnerable to adversarial attacks. Adversarial attacks are imperceptible perturbations added to an image such that the deep learning model misclassifies the image with a high confidence. Existing adversarial defenses validate their performance using only the classification accuracy. However, classification accuracy by itself is not a reliable metric to determine if the resulting image is “adversarial-free”. This is a foundational problem for online image recognition applications where the ground-truth of the incoming image is not known and hence we cannot compute the accuracy of the classifier or validate if the image is “adversarial-free” or not. This paper proposes a novel privacy preserving framework for defending Black box classifiers from adversarial attacks using an ensemble of iterative adversarial image purifiers whose performance is continuously validated in a loop using Bayesian uncertainties. The proposed approach can convert a single-step black box adversarial defense into an iterative defense and proposes three novel privacy preserving Knowledge Distillation (KD) approaches that use prior meta-information from various datasets to mimic the performance of the Black box classifier. Additionally, this paper proves the existence of an optimal distribution for the purified images that can reach a theoretical lower bound, beyond which the image can no longer be purified. Experimental results on six public benchmark datasets namely: 1) Fashion-MNIST, 2) CIFAR-10, 3) GTSRB, 4) MIO-TCD, 5) Tiny-ImageNet, and 6) MS-Celeb show that the proposed approach can consistently detect adversarial examples and purify or reject them against a variety of adversarial attacks. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
12. VideoDG: Generalizing Temporal Relations in Videos to Novel Domains.
- Author
-
Yao, Zhiyu, Wang, Yunbo, Wang, Jianmin, Yu, Philip S., and Long, Mingsheng
- Subjects
- *
VIDEOS , *DATA augmentation , *GENERALIZATION - Abstract
This paper introduces video domain generalization where most video classification networks degenerate due to the lack of exposure to the target domains of divergent distributions. We observe that the global temporal features are less generalizable, due to the temporal domain shift that videos from other unseen domains may have an unexpected absence or misalignment of the temporal relations. This finding has motivated us to solve video domain generalization by effectively learning the local-relation features of different timescales that are more generalizable, and exploiting them along with the global-relation features to maintain the discriminability. This paper presents the VideoDG framework with two technical contributions. The first is a new deep architecture named the Adversarial Pyramid Network, which improves the generalizability of video features by capturing the local-relation, global-relation, and cross-relation features progressively. On the basis of pyramid features, the second contribution is a new and robust approach of adversarial data augmentation that can bridge different video domains by improving the diversity and quality of augmented data. We construct three video domain generalization benchmarks in which domains are divided according to different datasets, different consequences of actions, or different camera views, respectively. VideoDG consistently outperforms the combinations of previous video classification models and existing domain generalization methods on all benchmarks. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
13. Improved Variance Reduction Methods for Riemannian Non-Convex Optimization.
- Author
-
Han, Andi and Gao, Junbin
- Subjects
- *
PRINCIPAL components analysis - Abstract
Variance reduction is popular in accelerating gradient descent and stochastic gradient descent for optimization problems defined on both euclidean space and Riemannian manifold. This paper further improves on existing variance reduction methods for non-convex Riemannian optimization, including R-SVRG and R-SRG/R-SPIDER by providing a unified framework for batch size adaptation. Such framework is more general than the existing works by considering retraction and vector transport and mini-batch stochastic gradients. We show that the adaptive-batch variance reduction methods require lower gradient complexities for both general non-convex and gradient dominated functions, under both finite-sum and online optimization settings. Moreover, under the new framework, we complete the analysis of R-SVRG and R-SRG, which is currently missing in the literature. We prove convergence of R-SVRG with much simpler analysis, which leads to curvature-free complexity bounds. We also show improved results for R-SRG under double-loop convergence, which match the optimal complexities as the R-SPIDER. In addition, we prove the first online complexity results for R-SVRG and R-SRG. Lastly, we discuss the potential of adapting batch size for non-smooth, constrained and second-order Riemannian optimizers. Extensive experiments on a variety of applications support the analysis and claims in the paper. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
14. Detailed Avatar Recovery From Single Image.
- Author
-
Zhu, Hao, Zuo, Xinxin, Yang, Haotian, Wang, Sen, Cao, Xun, and Yang, Ruigang
- Subjects
- *
ARTIFICIAL neural networks , *AVATARS (Virtual reality) , *HUMAN body - Abstract
This paper presents a novel framework to recover detailed avatar from a single image. It is a challenging task due to factors such as variations in human shapes, body poses, texture, and viewpoints. Prior methods typically attempt to recover the human body shape using a parametric-based template that lacks the surface details. As such resulting body shape appears to be without clothing. In this paper, we propose a novel learning-based framework that combines the robustness of the parametric model with the flexibility of free-form 3D deformation. We use the deep neural networks to refine the 3D shape in a Hierarchical Mesh Deformation (HMD) framework, utilizing the constraints from body joints, silhouettes, and per-pixel shading information. Our method can restore detailed human body shapes with complete textures beyond skinned models. Experiments demonstrate that our method has outperformed previous state-of-the-art approaches, achieving better accuracy in terms of both 2D IoU number and 3D metric distance. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
15. Continuous Action Reinforcement Learning From a Mixture of Interpretable Experts.
- Author
-
Akrour, Riad, Tateo, Davide, and Peters, Jan
- Subjects
- *
ACTIVE learning , *NONLINEAR functions , *MACHINE learning , *REINFORCEMENT learning , *APPROXIMATION algorithms - Abstract
Reinforcement learning (RL) has demonstrated its ability to solve high dimensional tasks by leveraging non-linear function approximators. However, these successes are mostly achieved by ’black-box’ policies in simulated domains. When deploying RL to the real world, several concerns regarding the use of a ’black-box’ policy might be raised. In order to make the learned policies more transparent, we propose in this paper a policy iteration scheme that retains a complex function approximator for its internal value predictions but constrains the policy to have a concise, hierarchical, and human-readable structure, based on a mixture of interpretable experts. Each expert selects a primitive action according to a distance to a prototypical state. A key design decision to keep such experts interpretable is to select the prototypical states from trajectory data. The main technical contribution of the paper is to address the challenges introduced by this non-differentiable prototypical state selection procedure. Experimentally, we show that our proposed algorithm can learn compelling policies on continuous action deep RL benchmarks, matching the performance of neural network based policies, but returning policies that are more amenable to human inspection than neural network or linear-in-feature policies. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
16. Recent Advances in Large Margin Learning.
- Author
-
Guo, Yiwen and Zhang, Changshui
- Subjects
- *
ARTIFICIAL neural networks , *MACHINE learning , *SUPPORT vector machines - Abstract
This paper serves as a survey of recent advances in large margin training and its theoretical foundations, mostly for (nonlinear) deep neural networks (DNNs) that are probably the most prominent machine learning models for large-scale data in the community over the past decade. We generalize the formulation of classification margins from classical research to latest DNNs, summarize theoretical connections between the margin, network generalization, and robustness, and introduce recent efforts in enlarging the margins for DNNs comprehensively. Since the viewpoint of different methods is discrepant, we categorize them into groups for ease of comparison and discussion in the paper. Hopefully, our discussions and overview inspire new research work in the community that aim to improve the performance of DNNs, and we also point to directions where the large margin principle can be verified to provide theoretical evidence why certain regularizations for DNNs function well in practice. We managed to shorten the paper such that the crucial spirit of large margin learning and related methods are better emphasized. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
17. Transform Quantization for CNN Compression.
- Author
-
Young, Sean I., Zhe, Wang, Taubman, David, and Girod, Bernd
- Subjects
- *
CONVOLUTIONAL neural networks , *RATE distortion theory , *VIDEO compression , *LOSSY data compression , *DATA compression - Abstract
In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trained CNN models. We optimally transform (decorrelate) and quantize the weights post-training using a rate–distortion framework to improve compression at any given quantization bit-rate. Transform quantization unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first introduce a theory of rate and distortion for CNN quantization and pose optimum quantization as a rate–distortion optimization problem. We then show that this problem can be solved using optimal bit-depth allocation following decorrelation by the optimal End-to-end Learned Transform (ELT) we derive in this paper. Experiments demonstrate that transform quantization advances the state of the art in CNN compression in both retrained and non-retrained quantization scenarios. In particular, we find that transform quantization with retraining is able to compress CNN models such as AlexNet, ResNet and DenseNet to very low bit-rates (1–2 bits). [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
18. Deep Learning-Based Multi-Focus Image Fusion: A Survey and a Comparative Study.
- Subjects
- *
IMAGE fusion , *IMAGE processing , *DEEP learning , *GENERATIVE adversarial networks - Abstract
Multi-focus image fusion (MFIF) is an important area in image processing. Since 2017, deep learning has been introduced to the field of MFIF and various methods have been proposed. However, there is a lack of survey papers that discuss deep learning-based MFIF methods in detail. In this study, we fill this gap by giving a detailed survey on deep learning-based MFIF algorithms, including methods, datasets and evaluation metrics. To the best of our knowledge, this is the first survey paper that focuses on deep learning-based approaches in the field of MFIF. Besides, extensive experiments have been conducted to compare the performance of deep learning-based MFIF algorithms with conventional MFIF approaches. By analyzing qualitative and quantitative results, we give some observations on the current status of MFIF and discuss some future prospects of this field. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
19. Learning Layout and Style Reconfigurable GANs for Controllable Image Synthesis.
- Author
-
Sun, Wei and Wu, Tianfu
- Subjects
- *
GENERATIVE adversarial networks , *COGNITIVE styles , *DEEP learning , *DNA-binding proteins - Abstract
With the remarkable recent progress on learning deep generative models, it becomes increasingly interesting to develop models for controllable image synthesis from reconfigurable structured inputs. This paper focuses on a recently emerged task, layout-to-image, whose goal is to learn generative models for synthesizing photo-realistic images from a spatial layout (i.e., object bounding boxes configured in an image lattice) and its style codes (i.e., structural and appearance variations encoded by latent vectors). This paper first proposes an intuitive paradigm for the task, layout-to-mask-to-image, which learns to unfold object masks in a weakly-supervised way based on an input layout and object style codes. The layout-to-mask component deeply interacts with layers in the generator network to bridge the gap between an input layout and synthesized images. Then, this paper presents a method built on Generative Adversarial Networks (GANs) for the proposed layout-to-mask-to-image synthesis with layout and style control at both image and object levels. The controllability is realized by a proposed novel Instance-Sensitive and Layout-Aware Normalization (ISLA-Norm) scheme. A layout semi-supervised version of the proposed method is further developed without sacrificing performance. In experiments, the proposed method is tested in the COCO-Stuff dataset and the Visual Genome dataset with state-of-the-art performance obtained. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
20. Low-Rank Riemannian Optimization for Graph-Based Clustering Applications.
- Author
-
Douik, Ahmed and Hassibi, Babak
- Subjects
- *
RIEMANNIAN manifolds , *RIEMANNIAN geometry , *STATISTICS , *STOCHASTIC matrices , *MACHINE learning , *PROBLEM solving - Abstract
With the abundance of data, machine learning applications engaged increased attention in the last decade. An attractive approach to robustify the statistical analysis is to preprocess the data through clustering. This paper develops a low-complexity Riemannian optimization framework for solving optimization problems on the set of positive semidefinite stochastic matrices. The low-complexity feature of the proposed algorithms stems from the factorization of the optimization variable $\mathbf {X}=\mathbf {Y}\mathbf {Y}^{\mathrm{T}}$ X = Y Y T and deriving conditions on the number of columns of $\mathbf {Y}$ Y under which the factorization yields a satisfactory solution. The paper further investigates the embedded and quotient geometries of the resulting Riemannian manifolds. In particular, the paper explicitly derives the tangent space, Riemannian gradients and Hessians, and a retraction operator allowing the design of efficient first and second-order optimization methods for the graph-based clustering applications of interest. The numerical results reveal that the resulting algorithms present a clear complexity advantage as compared with state-of-the-art euclidean and Riemannian approaches for graph clustering applications. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
21. Structure of Multiple Mirror System From Kaleidoscopic Projections of Single 3D Point.
- Author
-
Takahashi, Kosuke and Nobuhara, Shohei
- Subjects
- *
GRAPHICAL projection , *IMAGING systems , *CAMERA calibration , *PARAMETER estimation , *PROBLEM solving - Abstract
This paper proposes a novel algorithm of discovering the structure of a kaleidoscopic imaging system that consists of multiple planar mirrors and a camera. The kaleidoscopic imaging system can be recognized as the virtual multi-camera system and has strong advantages in that the virtual cameras are strictly synchronized and have the same intrinsic parameters. In this paper, we focus on the extrinsic calibration of the virtual multi-camera system. The problems to be solved in this paper are two-fold. The first problem is to identify to which mirror chamber each of the 2D projections of mirrored 3D points belongs. The second problem is to estimate all mirror parameters, i.e., normals, and distances of the mirrors. The key contribution of this paper is to propose novel algorithms for these problems using a single 3D point of unknown geometry by utilizing a kaleidoscopic projection constraint, which is an epipolar constraint on mirror reflections. We demonstrate the performance of the proposed algorithm of chamber assignment and estimation of mirror parameters with qualitative and quantitative evaluations using synthesized and real data. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
22. Part-Level Car Parsing and Reconstruction in Single Street View Images.
- Author
-
Geng, Qichuan, Zhang, Hong, Lu, Feixiang, Huang, Xinyu, Wang, Sen, Zhou, Zhong, and Yang, Ruigang
- Subjects
- *
KNOWLEDGE transfer , *AUTOMOBILES , *IMAGE reconstruction - Abstract
Part information has been proven to be resistant to occlusions and viewpoint changes, which are main difficulties in car parsing and reconstruction. However, in the absence of datasets and approaches incorporating car parts, there are limited works that benefit from it. In this paper, we propose the first part-aware approach for joint part-level car parsing and reconstruction in single street view images. Without labor-intensive part annotations on real images, our approach simultaneously estimates pose, shape, and semantic parts of cars. There are two contributions in this paper. First, our network introduces dense part information to facilitate pose and shape estimation, which is further optimized with a novel 3D loss. To obtain part information in real images, a class-consistent method is introduced to implicitly transfer part knowledge from synthesized images. Second, we construct the first high-quality dataset containing 348 car models with physical dimensions and part annotations. Given these models, 60K synthesized images with randomized configurations are generated. Experimental results demonstrate that part knowledge can be effectively transferred with our class-consistent method, which significantly improves part segmentation performance on real street views. By fusing dense part information, our pose and shape estimation results achieve the state-of-the-art performance on the ApolloCar3D and outperform previous approaches by large margins in terms of both A3DP-Abs and A3DP-Rel. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
23. Instance-Dependent Positive and Unlabeled Learning With Labeling Bias Estimation.
- Author
-
Gong, Chen, Wang, Qizhou, Liu, Tongliang, Han, Bo, You, Jane, Yang, Jian, and Tao, Dacheng
- Subjects
- *
ESTIMATION bias , *MAXIMUM likelihood statistics , *MATHEMATICAL optimization , *RANDOM variables , *PRODUCTION scheduling - Abstract
This paper studies instance-dependent Positive and Unlabeled (PU) classification, where whether a positive example will be labeled (indicated by $s$ s ) is not only related to the class label $y$ y , but also depends on the observation $\mathbf {x}$ x . Therefore, the labeling probability on positive examples is not uniform as previous works assumed, but is biased to some simple or critical data points. To depict the above dependency relationship, a graphical model is built in this paper which further leads to a maximization problem on the induced likelihood function regarding $P(s,y|\mathbf {x})$ P (s , y | x) . By utilizing the well-known EM and Adam optimization techniques, the labeling probability of any positive example $P(s=1|y=1,\mathbf {x})$ P (s = 1 | y = 1 , x) as well as the classifier induced by $P(y|\mathbf {x})$ P (y | x) can be acquired. Theoretically, we prove that the critical solution always exists, and is locally unique for linear model if some sufficient conditions are met. Moreover, we upper bound the generalization error for both linear logistic and non-linear network instantiations of our algorithm, with the convergence rate of expected risk to empirical risk as $\mathcal {O}(1/\sqrt{k}+1/\sqrt{n-k}+1/\sqrt{n})$ O (1 / k + 1 / n - k + 1 / n) ($k$ k and $n$ n are the sizes of positive set and the entire training set, respectively). Empirically, we compare our method with state-of-the-art instance-independent and instance-dependent PU algorithms on a wide range of synthetic, benchmark and real-world datasets, and the experimental results firmly demonstrate the advantage of the proposed method over the existing PU approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
24. Dual Encoding for Video Retrieval by Text.
- Author
-
Dong, Jianfeng, Li, Xirong, Xu, Chaoxi, Yang, Xun, Yang, Gang, Wang, Xun, and Wang, Meng
- Subjects
- *
VIDEO coding , *BLENDED learning , *ENCODING , *VIDEOS , *MACHINE learning , *RECURRENT neural networks - Abstract
This paper attacks the challenging problem of video retrieval by text. In such a retrieval paradigm, an end user searches for unlabeled videos by ad-hoc queries described exclusively in the form of a natural-language sentence, with no visual example provided. Given videos as sequences of frames and queries as sequences of words, an effective sequence-to-sequence cross-modal matching is crucial. To that end, the two modalities need to be first encoded into real-valued vectors and then projected into a common space. In this paper we achieve this by proposing a dual deep encoding network that encodes videos and queries into powerful dense representations of their own. Our novelty is two-fold. First, different from prior art that resorts to a specific single-level encoder, the proposed network performs multi-level encoding that represents the rich content of both modalities in a coarse-to-fine fashion. Second, different from a conventional common space learning algorithm which is either concept based or latent space based, we introduce hybrid space learning which combines the high performance of the latent space and the good interpretability of the concept space. Dual encoding is conceptually simple, practically effective and end-to-end trained with hybrid space learning. Extensive experiments on four challenging video datasets show the viability of the new method. Code and data are available at https://github.com/danieljf24/hybrid_space. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
25. Not All Samples are Trustworthy: Towards Deep Robust SVP Prediction.
- Author
-
Xu, Qianqian, Yang, Zhiyong, Jiang, Yangbangyan, Cao, Xiaochun, Yao, Yuan, and Huang, Qingming
- Subjects
- *
QUALITY control , *FORECASTING , *NOISE measurement , *COMPUTER vision , *OUTLIER detection , *CROWDSOURCING - Abstract
In this paper, we study the problem of estimating subjective visual properties (SVP) for images, which is an emerging task in Computer Vision. Generally speaking, collecting SVP datasets involves a crowdsourcing process where annotations are obtained from a wide range of online users. Since the process is done without quality control, SVP datasets are known to suffer from noise. This leads to the issue that not all samples are trustworthy. Facing this problem, we need to develop robust models for learning SVP from noisy crowdsourced annotations. In this paper, we construct two general robust learning frameworks for this application. Specifically, in the first framework, we propose a probabilistic framework to explicitly model the sparse unreliable patterns that exist in the dataset. It is noteworthy that we then provide an alternative framework that could reformulate the sparse unreliable patterns as a “contraction” operation over the original loss function. The latter framework leverages not only efficient end-to-end training but also rigorous theoretical analyses. To apply these frameworks, we further provide two models as implementations of the frameworks, where the sparse noise parameters could be interpreted with the HodgeRank theory. Finally, extensive theoretical and empirical studies show the effectiveness of our proposed framework. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
26. Geometry-Aware Generation of Adversarial Point Clouds.
- Author
-
Wen, Yuxin, Lin, Jiehong, Chen, Ke, Chen, C. L. Philip, and Jia, Kui
- Subjects
- *
POINT cloud , *SURFACE roughness , *ORTHOGONAL matching pursuit , *SURFACE properties , *THREE-dimensional imaging , *SOURCE code - Abstract
Machine learning models have been shown to be vulnerable to adversarial examples. While most of the existing methods for adversarial attack and defense work on the 2D image domain, a few recent attempts have been made to extend them to 3D point cloud data. However, adversarial results obtained by these methods typically contain point outliers, which are both noticeable and easy to defend against using the simple techniques of outlier removal. Motivated by the different mechanisms by which humans perceive 2D images and 3D shapes, in this paper we propose the new design of geometry-aware objectives, whose solutions favor (the discrete versions of) the desired surface properties of smoothness and fairness. To generate adversarial point clouds, we use a targeted attack misclassification loss that supports continuous pursuit of increasingly malicious signals. Regularizing the targeted attack loss with our proposed geometry-aware objectives results in our proposed method, Geometry-Aware Adversarial Attack ($GeoA^3$ G e o A 3 ). The results of $GeoA^3$ G e o A 3 tend to be more harmful, arguably harder to defend against, and of the key adversarial characterization of being imperceptible to humans. While the main focus of this paper is to learn to generate adversarial point clouds, we also present a simple but effective algorithm termed $Geo_{+}A^3$ G e o + A 3 -IterNormPro, with Iterative Normal Projection (IterNorPro) that solves a new objective function $Geo_{+}A^3$ G e o + A 3 , towards surface-level adversarial attacks via generation of adversarial point clouds. We quantitatively evaluate our methods on both synthetic and physical objects in terms of attack success rate and geometric regularity. For a qualitative evaluation, we conduct subjective studies by collecting human preferences from Amazon Mechanical Turk. Comparative results in comprehensive experiments confirm the advantages of our proposed methods. Our source codes are publicly available at https://github.com/Yuxin-Wen/GeoA3. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
27. Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos.
- Author
-
Yuan, Yitian, Ma, Lin, Wang, Jingwen, Liu, Wei, and Zhu, Wenwu
- Subjects
- *
TASK analysis - Abstract
Temporal sentence grounding in videos aims to localize one target video segment, which semantically corresponds to a given sentence. Unlike previous methods mainly focusing on matching semantics between the sentence and different video segments, in this paper, we propose a novel semantic conditioned dynamic modulation (SCDM) mechanism, which leverages the sentence semantics to modulate the temporal convolution operations for better correlating and composing the sentence-relevant video contents over time. The proposed SCDM also performs dynamically with respect to the diverse video contents so as to establish a precise semantic alignment between sentence and video. By coupling the proposed SCDM with a hierarchical temporal convolutional architecture, video segments with various temporal scales are composed and localized. Besides, more fine-grained clip-level actionness scores are also predicted with the SCDM-coupled temporal convolution on the bottom layer of the overall architecture, which are further used to adjust the temporal boundaries of the localized segments and thereby lead to more accurate grounding results. Experimental results on benchmark datasets demonstrate that the proposed model can improve the temporal grounding accuracy consistently, and further investigation experiments also illustrate the advantages of SCDM on stabilizing the model training and associating relevant video contents for temporal sentence grounding. Our code for this paper is available at https://github.com/yytzsy/SCDM-TPAMI. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
28. A Survey on Deep Learning Techniques for Stereo-Based Depth Estimation.
- Author
-
Laga, Hamid, Jospin, Laurent Valentin, Boussaid, Farid, and Bennamoun, Mohammed
- Subjects
- *
DEEP learning , *COMPUTER vision , *MACHINE learning , *AUGMENTED reality , *LEARNING communities , *AUTONOMOUS vehicles - Abstract
Estimating depth from RGB images is a long-standing ill-posed problem, which has been explored for decades by the computer vision, graphics, and machine learning communities. Among the existing techniques, stereo matching remains one of the most widely used in the literature due to its strong connection to the human binocular system. Traditionally, stereo-based depth estimation has been addressed through matching hand-crafted features across multiple images. Despite the extensive amount of research, these traditional techniques still suffer in the presence of highly textured areas, large uniform regions, and occlusions. Motivated by their growing success in solving various 2D and 3D vision problems, deep learning for stereo-based depth estimation has attracted a growing interest from the community, with more than 150 papers published in this area between 2014 and 2019. This new generation of methods has demonstrated a significant leap in performance, enabling applications such as autonomous driving and augmented reality. In this paper, we provide a comprehensive survey of this new and continuously growing field of research, summarize the most commonly used pipelines, and discuss their benefits and limitations. In retrospect of what has been achieved so far, we also conjecture what the future may hold for deep learning-based stereo for depth estimation research. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
29. Learnable Pooling in Graph Convolutional Networks for Brain Surface Analysis.
- Author
-
Gopinath, Karthik, Desrosiers, Christian, and Lombaert, Herve
- Subjects
- *
SURFACE analysis , *NON-Euclidean geometry , *ALZHEIMER'S disease , *THREE-dimensional imaging , *MACHINE learning - Abstract
Brain surface analysis is essential to neuroscience, however, the complex geometry of the brain cortex hinders computational methods for this task. The difficulty arises from a discrepancy between 3D imaging data, which is represented in Euclidean space, and the non-Euclidean geometry of the highly-convoluted brain surface. Recent advances in machine learning have enabled the use of neural networks for non-Euclidean spaces. These facilitate the learning of surface data, yet pooling strategies often remain constrained to a single fixed-graph. This paper proposes a new learnable graph pooling method for processing multiple surface-valued data to output subject-based information. The proposed method innovates by learning an intrinsic aggregation of graph nodes based on graph spectral embedding. We illustrate the advantages of our approach with in-depth experiments on two large-scale benchmark datasets. The ablation study in the paper illustrates the impact of various factors affecting our learnable pooling method. The flexibility of the pooling strategy is evaluated on four different prediction tasks, namely, subject-sex classification, regression of cortical region sizes, classification of Alzheimer’s disease stages, and brain age regression. Our experiments demonstrate the superiority of our learnable pooling approach compared to other pooling techniques for graph convolutional networks, with results improving the state-of-the-art in brain surface analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
30. 2021 Index IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 43.
- Subjects
- *
ARTIFICIAL intelligence , *SUBJECT headings , *INDEXES - Abstract
This index covers all technical items - papers, correspondence, reviews, etc. - that appeared in this periodical during the year, and items from previous years that were commented upon or corrected in this year. Departments and other items may also be covered if they have been judged to have archival value. The Author Index contains the primary entry for each item, listed under the first author's name. The primary entry includes the co-authors' names, the title of the paper or other item, and its location, specified by the publication abbreviation, year, month, and inclusive pagination. The Subject Index contains entries describing the item under all appropriate subject headings, plus the first author's name, the publication abbreviation, month, and year, and inclusive pages. Note that the item title is found only under the primary entry in the Author Index. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
31. Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey.
- Author
-
Jing, Longlong and Tian, Yingli
- Subjects
- *
VISUAL learning , *SUPERVISED learning , *DEEP learning , *COMPUTER vision - Abstract
Large-scale labeled data are generally required to train deep neural networks in order to obtain better performance in visual feature learning from images or videos for computer vision applications. To avoid extensive cost of collecting and annotating large-scale datasets, as a subset of unsupervised learning methods, self-supervised learning methods are proposed to learn general image and video features from large-scale unlabeled data without using any human-annotated labels. This paper provides an extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos. First, the motivation, general pipeline, and terminologies of this field are described. Then the common deep neural network architectures that used for self-supervised learning are summarized. Next, the schema and evaluation metrics of self-supervised learning methods are reviewed followed by the commonly used datasets for images, videos, audios, and 3D data, as well as the existing self-supervised visual feature learning methods. Finally, quantitative performance comparisons of the reviewed methods on benchmark datasets are summarized and discussed for both image and video feature learning. At last, this paper is concluded and lists a set of promising future directions for self-supervised visual feature learning. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
32. Domain Knowledge Alleviates Adversarial Attacks in Multi-Label Classifiers.
- Author
-
Melacci, Stefano, Ciravegna, Gabriele, Sotgiu, Angelo, Demontis, Ambra, Biggio, Battista, Gori, Marco, and Roli, Fabio
- Subjects
- *
MARGINAL distributions , *FIRST-order logic , *EPISTEMIC logic , *DATA distribution , *NAIVE Bayes classification , *SUPERVISED learning , *MACHINE learning - Abstract
Adversarial attacks on machine learning-based classifiers, along with defense mechanisms, have been widely studied in the context of single-label classification problems. In this paper, we shift the attention to multi-label classification, where the availability of domain knowledge on the relationships among the considered classes may offer a natural way to spot incoherent predictions, i.e., predictions associated to adversarial examples lying outside of the training data distribution. We explore this intuition in a framework in which first-order logic knowledge is converted into constraints and injected into a semi-supervised learning problem. Within this setting, the constrained classifier learns to fulfill the domain knowledge over the marginal distribution, and can naturally reject samples with incoherent predictions. Even though our method does not exploit any knowledge of attacks during training, our experimental analysis surprisingly unveils that domain-knowledge constraints can help detect adversarial examples effectively, especially if such constraints are not known to the attacker. We show how to implement an adaptive attack exploiting knowledge of the constraints and, in a specifically-designed setting, we provide experimental comparisons with popular state-of-the-art attacks. We believe that our approach may provide a significant step towards designing more robust multi-label classifiers. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
33. Hyperbolic Deep Neural Networks: A Survey.
- Author
-
Peng, Wei, Varanka, Tuomas, Mostafa, Abdelrahman, Shi, Henglin, and Zhao, Guoying
- Subjects
- *
ARTIFICIAL neural networks , *COMPUTER vision , *HYPERBOLIC spaces , *NATURAL language processing , *CORPORATE finance , *STIMULUS generalization - Abstract
Recently, hyperbolic deep neural networks (HDNNs) have been gaining momentum as the deep representations in the hyperbolic space provide high fidelity embeddings with few dimensions, especially for data possessing hierarchical structure. Such a hyperbolic neural architecture is quickly extended to different scientific fields, including natural language processing, single-cell RNA-sequence analysis, graph embedding, financial analysis, and computer vision. The promising results demonstrate its superior capability, significant compactness of the model, and a substantially better physical interpretability than its counterpart in the euclidean space. To stimulate future research, this paper presents a comprehensive review of the literature around the neural components in the construction of HDNN, as well as the generalization of the leading deep approaches to the hyperbolic space. It also presents current applications of various tasks, together with insightful observations and identifying open questions and promising future directions. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
34. Improved Normalized Cut for Multi-View Clustering.
- Author
-
Zhong, Guo and Pun, Chi-Man
- Subjects
- *
MATRIX decomposition , *LINEAR programming , *DATA structures , *SOFTWARE measurement - Abstract
Spectral clustering (SC) algorithms have been successful in discovering meaningful patterns since they can group arbitrarily shaped data structures. Traditional SC approaches typically consist of two sequential stages, i.e., performing spectral decomposition of an affinity matrix and then rounding the relaxed continuous clustering result into a binary indicator matrix. However, such a two-stage process could make the obtained binary indicator matrix severely deviate from the ground true one. This is because the former step is not devoted to achieving an optimal clustering result. To alleviate this issue, this paper presents a general joint framework to simultaneously learn the optimal continuous and binary indicator matrices for multi-view clustering, which also has the ability to tackle the conventional single-view case. Specially, we provide theoretical proof for the proposed method. Furthermore, an effective alternate updating algorithm is developed to optimize the corresponding complex objective. A number of empirical results on different benchmark datasets demonstrate that the proposed method outperforms several state-of-the-arts in terms of six clustering metrics. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
35. Collaborative Learning of Label Semantics and Deep Label-Specific Features for Multi-Label Classification.
- Author
-
Hang, Jun-Yi and Zhang, Min-Ling
- Subjects
- *
COLLABORATIVE learning , *SEMANTICS , *CLASSIFICATION algorithms , *SUPERVISED learning , *DEEP learning - Abstract
In multi-label classification, the strategy of label-specific features has been shown to be effective to learn from multi-label examples by accounting for the distinct discriminative properties of each class label. However, most existing approaches exploit the semantic relations among labels as immutable prior knowledge, which may not be appropriate to constrain the learning process of label-specific features. In this paper, we propose to learn label semantics and label-specific features in a collaborative way. Accordingly, a deep neural network (DNN) based approach named Clif, i.e., CollaborativeLearning of label semantIcs and deep label-specificFeatures for multi-label classification, is proposed. By integrating a graph autoencoder for encoding semantic relations in the label space and a tailored feature-disentangling module for extracting label-specific features, Clif is able to employ the learned label semantics to guide mining label-specific features and propagate label-specific discriminative properties to the learning process of the label semantics. In such a way, the learning of label semantics and label-specific features interact and facilitate with each other so that label semantics can provide more accurate guidance to label-specific feature learning. Comprehensive experiments on 14 benchmark data sets show that our approach outperforms other well-established multi-label classification algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
36. Orientation Keypoints for 6D Human Pose Estimation.
- Author
-
Fisch, Martin and Clark, Ronald
- Subjects
- *
COMPUTER-generated imagery , *RANGE of motion of joints , *POSE estimation (Computer vision) , *POINT set theory , *ROTATIONAL motion , *COMPUTER vision , *MOTION capture (Human mechanics) - Abstract
Most realtime human pose estimation approaches are based on detecting joint positions. Using the detected joint positions, the yaw and pitch of the limbs can be computed. However, the roll along the limb, which is critical for application such as sports analysis and computer animation, cannot be computed as this axis of rotation remains unobserved. In this paper we therefore introduce orientation keypoints, a novel approach for estimating the full position and rotation of skeletal joints, using only single-frame RGB images. Inspired by how motion-capture systems use a set of point markers to estimate full bone rotations, our method uses virtual markers to generate sufficient information to accurately infer rotations with simple post processing. The rotation predictions improve upon the best reported mean error for joint angles by 48% and achieves 93% accuracy across 15 bone rotations. The method also improves the current state-of-the-art results for joint positions by 14% as measured by MPJPE on the principle dataset, and generalizes well to in-the-wild datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
37. luvHarris: A Practical Corner Detector for Event-Cameras.
- Author
-
Glover, Arren, Dinale, Aiko, Rosa, Leandro De Souza, Bamford, Simeon, and Bartolozzi, Chiara
- Subjects
- *
DETECTORS , *COMPUTER vision , *CAMERAS - Abstract
There have been a number of corner detection methods proposed for event cameras in the last years, since event-driven computer vision has become more accessible. Current state-of-the-art have either unsatisfactory accuracy or real-time performance when considered for practical use, for example when a camera is randomly moved in an unconstrained environment. In this paper, we present yet another method to perform corner detection, dubbed look-up event-Harris (luvHarris), that employs the Harris algorithm for high accuracy but manages an improved event throughput. Our method has two major contributions, 1. a novel “threshold ordinal event-surface” that removes certain tuning parameters and is well suited for Harris operations, and 2. an implementation of the Harris algorithm such that the computational load per event is minimised and computational heavy convolutions are performed only ‘as-fast-as-possible’, i.e., only as computational resources are available. The result is a practical, real-time, and robust corner detector that runs more than $2.6\times$ 2. 6 × the speed of current state-of-the-art; a necessity when using a high-resolution event-camera in real-time. We explain the considerations taken for the approach, compare the algorithm to current state-of-the-art in terms of computational performance and detection accuracy, and discuss the validity of the proposed approach for event cameras. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
38. Multi-Attribute Discriminative Representation Learning for Prediction of Adverse Drug-Drug Interaction.
- Author
-
Zhu, Jiajing, Liu, Yongguo, Zhang, Yun, Chen, Zhi, and Wu, Xindong
- Subjects
- *
DRUG interactions , *GENERATIVE adversarial networks , *VIDEO coding , *MATRIX decomposition , *FEATURE selection - Abstract
Adverse drug-drug interaction (ADDI) is a significant life-threatening issue, posing a leading cause of hospitalizations and deaths in healthcare systems. This paper proposes a unified Multi-Attribute Discriminative Representation Learning (MADRL) model for ADDI prediction. Unlike the existing works that equally treat features of each attribute without discrimination and do not consider the underlying relationship among drugs, we first develop a regularized optimization problem based on CUR matrix decomposition for joint representative drug and discriminative feature selection such that the selected drugs and features can well approximate the original feature spaces and the critical factors discriminative to ADDIs can be properly explored. Different from the existing models that ignore the consistent and unique properties among attributes, a Generative Adversarial Network (GAN) framework is then designed to capture the inter-attribute shared and intra-attribute specific representations of adverse drug pairs for exploiting their consensus and complementary information in ADDI prediction. Meanwhile, MADRL is compatible with any kind of attributes and capable of exploring their respective effects on ADDI prediction. An iterative algorithm based on the alternating direction method of multipliers is developed for optimization. Experiments on publicly available dataset demonstrate the effectiveness of MADRL when compared with eleven baselines and its six variants. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
39. Learning and Meshing From Deep Implicit Surface Networks Using an Efficient Implementation of Analytic Marching.
- Author
-
Lei, Jiabao, Jia, Kui, and Ma, Yi
- Subjects
- *
DEEP learning , *COMPUTER vision , *MULTILAYER perceptrons , *COMPUTER graphics , *SURFACE reconstruction , *PARALLEL programming , *IMPLICIT functions - Abstract
Reconstruction of object or scene surfaces has tremendous applications in computer vision, computer graphics, and robotics. The topic attracts increased attention with the emerging pipeline of deep learning surface reconstruction, where implicit field functions constructed from deep networks (e.g., multi-layer perceptrons or MLPs) are proposed for generative shape modeling. In this paper, we study a fundamental problem in this context about recovering a surface mesh from an implicit field function whose zero-level set captures the underlying surface. To achieve the goal, existing methods rely on traditional meshing algorithms (e.g., the de-facto standard marching cubes); while promising, they suffer from loss of precision learned in the implicit surface networks, due to the use of discrete space sampling in marching cubes. Given that an MLP with activations of Rectified Linear Unit (ReLU) partitions its input space into a number of linear regions, we are motivated to connect this local linearity with a same property owned by the desired result of polygon mesh. More specifically, we identify from the linear regions, partitioned by an MLP based implicit function, the analytic cells and analytic facesthat are associated with the function's zero-level isosurface. We prove that under mild conditions, the identified analytic faces are guaranteed to connect and form a closed, piecewise planar surface. Based on the theorem, we propose an algorithm of analytic marching, which marches among analytic cells to exactly recover the mesh captured by an implicit surface network. We also show that our theory and algorithm are equally applicable to advanced MLPs with shortcut connections and max pooling. Given the parallel nature of analytic marching, we contribute AnalyticMesh, a software package that supports efficient meshing of implicit surface networks via CUDA parallel computing, and mesh simplification for efficient downstream processing. We apply our method to different settings of generative shape modeling using implicit surface networks. Extensive experiments demonstrate our advantages over existing methods in terms of both meshing accuracy and efficiency. Codes are at https://github.com/Karbo123/AnalyticMesh. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
40. A Novel Approach to Large-Scale Dynamically Weighted Directed Network Representation.
- Author
-
Luo, Xin, Wu, Hao, Wang, Zhi, Wang, Jianjun, and Meng, Deyu
- Subjects
- *
LAGRANGIAN functions - Abstract
A dynamically weighted directed network (DWDN) is frequently encountered in various big data-related applications like a terminal interaction pattern analysis system (TIPAS) concerned in this study. It consists of large-scale dynamic interactions among numerous nodes. As the involved nodes increase drastically, it becomes impossible to observe their full interactions at each time slot, making a resultant DWDN High Dimensional and Incomplete (HDI). An HDI DWDN, in spite of its incompleteness, contains rich knowledge regarding involved nodes’ various behavior patterns. To extract such knowledge from an HDI DWDN, this paper proposes a novel Alternating direction method of multipliers (ADMM)-based Nonnegative Latent-factorization of Tensors (ANLT) model. It adopts three-fold ideas: a) building a data density-oriented augmented Lagrangian function for efficiently handling an HDI tensor's incompleteness and nonnegativity; b) splitting the optimization task in each iteration into an elaborately designed subtask series where each one is solved based on the previously solved ones following the ADMM principle to achieve fast convergence; and c) theoretically proving that its convergence is guaranteed with its efficient learning scheme. Experimental results on six DWDNs from real applications demonstrate that the proposed ANLT outperforms state-of-the-art models significantly in both computational efficiency and prediction accuracy for missing links of an HDI DWDN. Hence, this study proposes a novel and efficient approach to large-scale DWDN representation. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
41. Ratio Sum Versus Sum Ratio for Linear Discriminant Analysis.
- Author
-
Wang, Jingyu, Wang, Hongmei, Nie, Feiping, and Li, Xuelong
- Subjects
- *
FISHER discriminant analysis , *SINGULAR value decomposition , *MATRIX inversion , *COVARIANCE matrices , *MATHEMATICAL optimization - Abstract
Dimension reduction is a critical technology for high-dimensional data processing, where Linear Discriminant Analysis (LDA) and its variants are effective supervised methods. However, LDA prefers to feature with smaller variance, which causes feature with weak discriminative ability retained. In this paper, we propose a novel Ratio Sum for Linear Discriminant Analysis (RSLDA), which aims at maximizing discriminative ability of each feature in subspace. To be specific, it maximizes the sum of ratio of the between-class distance to the within-class distance in each dimension of subspace. Since the original RSLDA problem is difficult to obtain the closed solution, an equivalent problem is developed which can be solved by an alternative optimization algorithm. For solving the equivalent problem, it is transformed into two sub-problems, one of which can be solved directly, the other is changed into a convex optimization problem, where singular value decomposition is employed instead of matrix inversion. Consequently, performance of algorithm cannot be affected by the non-singularity of covariance matrix. Furthermore, Kernel RSLDA (KRSLDA) is presented to improve the robustness of RSLDA. Additionally, time complexity of RSLDA and KRSLDA are analyzed. Extensive experiments show that RSLDA and KRSLDA outperforms other comparison methods on toy datasets and multiple public datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
42. Bridging the Gap Between Few-Shot and Many-Shot Learning via Distribution Calibration.
- Author
-
Yang, Shuo, Wu, Songhua, Liu, Tongliang, and Xu, Min
- Subjects
- *
GAUSSIAN distribution , *CALIBRATION , *APPROXIMATION error , *BRIDGES , *DATA distribution - Abstract
A major gap between few-shot and many-shot learning is the data distribution empirically oserved by the model during training. In few-shot learning, the learned model can easily become over-fitted based on the biased distribution formed by only a few training examples, while the ground-truth data distribution is more accurately uncovered in many-shot learning to learn a well-generalized model. In this paper, we propose to calibrate the distribution of these few-sample classes to be more unbiased to alleviate such an over-fitting problem. The distribution calibration is achieved by transferring statistics from the classes with sufficient examples to those few-sample classes. After calibration, an adequate number of examples can be sampled from the calibrated distribution to expand the inputs to the classifier. Specifically, we assume every dimension in the feature representation from the same class follows a Gaussian distribution so that the mean and the variance of the distribution can borrow from that of similar classes whose statistics are better estimated with an adequate number of samples. Extensive experiments on three datasets, miniImageNet, tieredImageNet, and CUB, show that a simple linear classifier trained using the features sampled from our calibrated distribution can outperform the state-of-the-art accuracy by a large margin. Besides the favorable performance, the proposed method also exhibits high flexibility by showing consistent accuracy improvement when it is built on top of any off-the-shelf pretrained feature extractors and classification models without extra learnable parameters. The visualization of these generated features demonstrates that our calibrated distribution is an accurate estimation thus the generalization ability gain is convincing. We also establish a generalization error bound for the proposed distribution-calibration-based few-shot learning, which consists of the distribution assumption error, the distribution approximation error, and the estimation error. This generalization error bound theoretically justifies the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
43. Syntax Customized Video Captioning by Imitating Exemplar Sentences.
- Author
-
Yuan, Yitian, Ma, Lin, and Zhu, Wenwu
- Subjects
- *
SYNTAX (Grammar) , *VIDEO compression , *SEMANTICS , *RECURRENT neural networks - Abstract
Enhancing the diversity of sentences to describe video contents is an important problem arising in recent video captioning research. In this paper, we explore this problem from a novel perspective of customizing video captions by imitating exemplar sentence syntaxes. Specifically, given a video and any syntax-valid exemplar sentence, we introduce a new task of Syntax Customized Video Captioning (SCVC) aiming to generate one caption which not only semantically describes the video contents but also syntactically imitates the given exemplar sentence. To tackle the SCVC task, we propose a novel video captioning model, where a hierarchical sentence syntax encoder is first designed to extract the syntactic structure of the exemplar sentence, then a syntax conditioned caption decoder is devised to generate the syntactically structured caption expressing video semantics. As there is no available syntax customized groundtruth video captions, we tackle such a challenge by proposing a new training strategy, which leverages the traditional pairwise video captioning data and our collected exemplar sentences to accomplish the model learning. Extensive experiments, in terms of semantic, syntactic, fluency, and diversity evaluations, clearly demonstrate our model capability to generate syntax-varied and semantics-coherent video captions that well imitate different exemplar sentences with enriched diversities. Code is available at https://github.com/yytzsy/Syntax-Customized-Video-Captioning. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
44. Background-Click Supervision for Temporal Action Localization.
- Author
-
Yang, Le, Han, Junwei, Zhao, Tao, Lin, Tianwei, Zhang, Dingwen, and Chen, Jianxin
- Subjects
- *
SUPERVISION , *ACTIVE learning , *VIDEO compression , *MARKOV processes , *TASK analysis , *MATHEMATICAL convolutions - Abstract
Weakly supervised temporal action localization aims at learning the instance-level action pattern from the video-level labels, where a significant challenge is action-context confusion. To overcome this challenge, one recent work builds an action-click supervision framework. It requires similar annotation costs but can steadily improve the localization performance when compared to the conventional weakly supervised methods. In this paper, by revealing that the performance bottleneck of the existing approaches mainly comes from the background errors, we find that a stronger action localizer can be trained with labels on the background video frames rather than those on the action frames. To this end, we convert the action-click supervision to the background-click supervision and develop a novel method, called BackTAL. Specifically, BackTAL implements two-fold modeling on the background video frames, i.e., the position modeling and the feature modeling. In position modeling, we not only conduct supervised learning on the annotated video frames but also design a score separation module to enlarge the score differences between the potential action frames and backgrounds. In feature modeling, we propose an affinity module to measure frame-specific similarities among neighboring frames and dynamically attend to informative neighbors when calculating temporal convolution. Extensive experiments on three benchmarks are conducted, which demonstrate the high performance of the established BackTAL and the rationality of the proposed background-click supervision. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
45. Point Cloud Instance Segmentation With Semi-Supervised Bounding-Box Mining.
- Author
-
Liao, Yongbin, Zhu, Hongyuan, Zhang, Yanggang, Ye, Chuangguan, Chen, Tao, and Fan, Jiayuan
- Subjects
- *
POINT cloud , *SUPERVISED learning , *DEEP learning - Abstract
Point cloud instance segmentation has achieved huge progress with the emergence of deep learning. However, these methods are usually data-hungry with expensive and time-consuming dense point cloud annotations. To alleviate the annotation cost, unlabeled or weakly labeled data is still less explored in the task. In this paper, we introduce the first semi-supervised point cloud instance segmentation framework (SPIB) using both labeled and unlabelled bounding boxes as supervision. To be specific, our SPIB architecture involves a two-stage learning procedure. For stage one, a bounding box proposal generation network is trained under a semi-supervised setting with perturbation consistency regularization (SPCR). The regularization works by enforcing an invariance of the bounding box predictions over different perturbations applied to the input point clouds, to provide self-supervision for network learning. For stage two, the bounding box proposals with SPCR are grouped into some subsets, and the instance masks are mined inside each subset with a novel semantic propagation module and a property consistency graph module. Moreover, we introduce a novel occupancy ratio guided refinement module to refine the instance masks. Extensive experiments on the challenging ScanNet v2 dataset demonstrate our method can achieve competitive performance compared with the recent fully-supervised methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
46. PRIN/SPRIN: On Extracting Point-Wise Rotation Invariant Features.
- Author
-
You, Yang, Lou, Yujing, Shi, Ruoxi, Liu, Qi, Tai, Yu-Wing, Ma, Lizhuang, Wang, Weiming, and Lu, Cewu
- Subjects
- *
ROTATIONAL motion , *POINT cloud , *DATA augmentation , *FEATURE extraction , *RECURRENT neural networks - Abstract
Point cloud analysis without pose priors is very challenging in real applications, as the orientations of point clouds are often unknown. In this paper, we propose a brand new point-set learning framework PRIN, namely, Point-wise Rotation Invariant Network, focusing on rotation invariant feature extraction in point clouds analysis. We construct spherical signals by Density Aware Adaptive Sampling to deal with distorted point distributions in spherical space. Spherical Voxel Convolution and Point Re-sampling are proposed to extract rotation invariant features for each point. In addition, we extend PRIN to a sparse version called SPRIN, which directly operates on sparse point clouds. Both PRIN and SPRIN can be applied to tasks ranging from object classification, part segmentation, to 3D feature matching and label alignment. Results show that, on the dataset with randomly rotated point clouds, SPRIN demonstrates better performance than state-of-the-art methods without any data augmentation. We also provide thorough theoretical proof and analysis for point-wise rotation invariance achieved by our methods. The code to reproduce our results will be made publicly available. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
47. Factors of Influence for Transfer Learning Across Diverse Appearance Domains and Task Types.
- Author
-
Mensink, Thomas, Uijlings, Jasper, Kuznetsova, Alina, Gygli, Michael, and Ferrari, Vittorio
- Subjects
- *
COMPUTER vision , *APPLICATION software , *AUTONOMOUS vehicles , *COMPUTER simulation , *TECHNOLOGY transfer , *IMAGE segmentation - Abstract
Transfer learning enables to re-use knowledge learned on a source task to help learning a target task. A simple form of transfer learning is common in current state-of-the-art computer vision models, i.e., pre-training a model for image classification on the ILSVRC dataset, and then fine-tune on any target task. However, previous systematic studies of transfer learning have been limited and the circumstances in which it is expected to work are not fully understood. In this paper we carry out an extensive experimental exploration of transfer learning across vastly different image domains (consumer photos, autonomous driving, aerial imagery, underwater, indoor scenes, synthetic, close-ups) and task types (semantic segmentation, object detection, depth estimation, keypoint detection). Importantly, these are all complex, structured output tasks types relevant to modern computer vision applications. In total we carry out over 2000 transfer learning experiments, including many where the source and target come from different image domains, task types, or both. We systematically analyze these experiments to understand the impact of image domain, task type, and dataset size on transfer learning performance. Our study leads to several insights and concrete recommendations: (1) for most tasks there exists a source which significantly outperforms ILSVRC’12 pre-training; (2) the image domain is the most important factor for achieving positive transfer; (3) the source dataset should include the image domain of the target dataset to achieve best results; (4) at the same time, we observe only small negative effects when the image domain of the source task is much broader than that of the target; (5) transfer across task types can be beneficial, but its success is heavily dependent on both the source and target task types. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
48. Structured Multimodal Attentions for TextVQA.
- Author
-
Gao, Chenyu, Zhu, Qi, Wang, Peng, Li, Hui, Liu, Yuliang, Hengel, Anton van den, and Wu, Qi
- Subjects
- *
OPTICAL character recognition , *REPRESENTATIONS of graphs , *TEXT recognition , *NATURAL languages - Abstract
Text based Visual Question Answering (TextVQA) is a recently raised challenge requiring models to read text in images and answer natural language questions by jointly reasoning over the question, textual information and visual content. Introduction of this new modality - Optical Character Recognition (OCR) tokens ushers in demanding reasoning requirements. Most of the state-of-the-art (SoTA) VQA methods fail when answer these questions because of three reasons: (1) poor text reading ability; (2) lack of textual-visual reasoning capacity; and (3) choosing discriminative answering mechanism over generative couterpart (although this has been further addressed by M4C). In this paper, we propose an end-to-end structured multimodal attention (SMA) neural network to mainly solve the first two issues above. SMA first uses a structural graph representation to encode the object-object, object-text and text-text relationships appearing in the image, and then designs a multimodal graph attention network to reason over it. Finally, the outputs from the above modules are processed by a global-local attentional answering module to produce an answer splicing together tokens from both OCR and general vocabulary iteratively by following M4C. Our proposed model outperforms the SoTA models on TextVQA dataset and two tasks of ST-VQA dataset among all models except pre-training based TAP. Demonstrating strong reasoning ability, it also won first place in TextVQA Challenge 2020. We extensively test different OCR methods on several reasoning models and investigate the impact of gradually increased OCR performance on TextVQA benchmark. With better OCR results, different models share dramatic improvement over the VQA accuracy, but our model benefits most blessed by strong textual-visual reasoning ability. To grant our method an upper bound and make a fair testing base available for further works, we also provide human-annotated ground-truth OCR annotations for the TextVQA dataset, which were not given in the original release. The code and ground-truth OCR annotations for the TextVQA dataset are available at https://github.com/ChenyuGAO-CS/SMA. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
49. Unsupervised Intrinsic Image Decomposition Using Internal Self-Similarity Cues.
- Author
-
Zhang, Qing, Zhou, Jin, Zhu, Lei, Sun, Wei, Xiao, Chunxia, and Zheng, Wei-Shi
- Subjects
- *
SUPERVISED learning , *ACOUSTIC surface waves , *DECOMPOSITION method - Abstract
Recent learning-based intrinsic image decomposition methods have achieved remarkable progress. However, they usually require massive ground truth intrinsic images for supervised learning, which limits their applicability on real-world images since obtaining ground truth intrinsic decomposition for natural images is very challenging. In this paper, we present an unsupervised framework that is able to learn the decomposition effectively from a single natural image by training solely with the image itself. Our approach is built upon the observations that the reflectance of a natural image typically has high internal self-similarity of patches, and a convolutional generation network tends to boost the self-similarity of an image when trained for image reconstruction. Based on the observations, an unsupervised intrinsic decomposition network (UIDNet) consisting of two fully convolutional encoder-decoder sub-networks, i.e., reflectance prediction network (RPN) and shading prediction network (SPN), is devised to decompose an image into reflectance and shading by promoting the internal self-similarity of the reflectance component, in a way that jointly trains RPN and SPN to reproduce the given image. A novel loss function is also designed to make effective the training for intrinsic decomposition. Experimental results on three benchmark real-world datasets demonstrate the superiority of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
50. Unsupervised Grouped Axial Data Modeling via Hierarchical Bayesian Nonparametric Models With Watson Distributions.
- Author
-
Fan, Wentao, Yang, Lin, and Bouguila, Nizar
- Subjects
- *
DATA modeling , *WATSON (Computer) , *INFERENTIAL statistics , *IMAGE analysis , *MATHEMATICAL optimization , *GENE expression , *MACHINE learning - Abstract
This paper aims at proposing an unsupervised hierarchical nonparametric Bayesian framework for modeling axial data (i.e., observations are axes of direction) that can be partitioned into multiple groups, where each observation within a group is sampled from a mixture of Watson distributions with an infinite number of components that are allowed to be shared across different groups. First, we propose a hierarchical nonparametric Bayesian model for modeling grouped axial data based on the hierarchical Pitman-Yor process mixture model of Watson distributions. Then, we demonstrate that by setting the discount parameters of the proposed model to 0, another hierarchical nonparametric Bayesian model based on hierarchical Dirichlet process can be derived for modeling axial data. To learn the proposed models, we systematically develop a closed-form optimization algorithm based on the collapsed variational Bayes (CVB) inference. Furthermore, to ensure the convergence of the proposed learning algorithm, an annealing mechanism is introduced to the framework of CVB inference, leading to an averaged collapsed variational Bayes inference strategy. The merits of the proposed models for modeling grouped axial data are demonstrated through experiments on both synthetic data and real-world applications involving gene expression data clustering and depth image analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.