718 results on '"Davis, Larry S."'
Search Results
702. Relaxation Labeling: 25 Years and Still Iterating
- Author
-
Zucker, Steven W. and Davis, Larry S., editor
- Published
- 2001
- Full Text
- View/download PDF
703. From a Robust Hierarchy to a Hierarchy of Robustness
- Author
-
Meer, Peter and Davis, Larry S., editor
- Published
- 2001
- Full Text
- View/download PDF
704. On the Computational Modeling of Human Vision
- Author
-
Beck, Jacob and Davis, Larry S., editor
- Published
- 2001
- Full Text
- View/download PDF
705. Attributes driven tracklet-to-tracklet person re-identification using latent prototypes space mapping.
- Author
-
Su, Chi, Zhang, Shiliang, Yang, Fan, Zhang, Guangxiao, Tian, Qi, Gao, Wen, and Davis, Larry S.
- Subjects
- *
BIOMETRIC identification , *PROTOTYPES , *CARTOGRAPHY , *IMAGE processing , *ALGORITHMS , *SEMANTIC networks (Information theory) - Abstract
Most of current person re-identification works identify a person by matching his/her probe image against a galley images set. One feasible way to improve the identification accuracy is the multi-shot re-identification, where the probe includes a small set of images rather than a single image. In this paper, we study the tracklet-to-tracklet identification, where both the probe and the target dataset are composed of small sets of sequential images, i.e., tracklets. To solve this problem and make our algorithm robust under multi-camera setting, we take full advantage of low-level features, attributes and inter-attribute correlations at the same time. Attributes are expected to offer semantic information complementary to low-level features. In order to discover the correlations among attributes, a novel discriminative model is proposed to exploit low-level features and map attributes to a discriminative latent prototypes space. An alternating optimization procedure is designed to perform the learning process. We also devise a number of voting schemes to total up matching scores from images to tracklets. Experiments on four public datasets show that our approach achieves a consistently better performance than existing person re-identification methods. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
706. A non-parametric approach to extending generic binary classifiers for multi-classification.
- Author
-
Santhanam, Venkataraman, Morariu, Vlad I., Harwood, David, and Davis, Larry S.
- Subjects
- *
PATTERN recognition systems , *NONPARAMETRIC estimation , *COMPUTER vision , *BINARY number system , *SUBSPACES (Mathematics) , *QUANTITATIVE research - Abstract
Ensemble methods, which combine generic binary classifier scores to generate a multi-classification output, are commonly used in state-of-the-art computer vision and pattern recognition systems that rely on multi-classification. In particular, we consider the one-vs-one decomposition of the multi-class problem, where binary classifier models are trained to discriminate every class pair. We describe a robust multi-classification pipeline, which at a high level involves projecting binary classifier scores into compact orthogonal subspaces, followed by a non-linear probabilistic multi-classification step, using Kernel Density Estimation (KDE). We compare our approach against state-of-the-art ensemble methods (DCS, DRCW) on 16 multi-class datasets. We also compare against the most commonly used ensemble methods (VOTE, NEST) on 6 real-world computer vision datasets. Finally, we measure the statistical significance of our approach using non-parametric tests. Experimental results show that our approach gives a statistically significant improvement in multi-classification performance over state-of-the-art. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
707. Joint Image Clustering and Labeling by Matrix Factorization.
- Author
-
Hong, Seunghoon, Choi, Jonghyun, Feyereisl, Jan, Han, Bohyung, and Davis, Larry S.
- Subjects
- *
ALGORITHM research , *IMAGE processing , *IMAGE databases , *LEARNING , *SEARCH engines - Abstract
We propose a novel algorithm to cluster and annotate a set of input images jointly, where the images are clustered into several discriminative groups and each group is identified with representative labels automatically. For these purposes, each input image is first represented by a distribution of candidate labels based on its similarity to images in a labeled reference image database. A set of these label-based representations are then refined collectively through a non-negative matrix factorization with sparsity and orthogonality constraints; the refined representations are employed to cluster and annotate the input images jointly. The proposed approach demonstrates performance improvements in image clustering over existing techniques, and illustrates competitive image labeling accuracy in both quantitative and qualitative evaluation. In addition, we extend our joint clustering and labeling framework to solving the weakly-supervised image classification problem and obtain promising results. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
708. Multi-Directional Multi-Level Dual-Cross Patterns for Robust Face Recognition.
- Author
-
Ding, Changxing, Choi, Jonghyun, Tao, Dacheng, and Davis, Larry S.
- Subjects
- *
HUMAN facial recognition software , *ROBUST control , *IMAGE recognition (Computer vision) , *TEXTURE analysis (Image processing) , *BINARY control systems - Abstract
To perform unconstrained face recognition robust to variations in illumination, pose and expression, this paper presents a new scheme to extract “Multi-Directional Multi-Level Dual-Cross Patterns” (MDML-DCPs) from face images. Specifically, the MDML-DCPs scheme exploits the first derivative of Gaussian operator to reduce the impact of differences in illumination and then computes the DCP feature at both the holistic and component levels. DCP is a novel face image descriptor inspired by the unique textural structure of human faces. It is computationally efficient and only doubles the cost of computing local binary patterns, yet is extremely robust to pose and expression variations. MDML-DCPs comprehensively yet efficiently encodes the invariant characteristics of a face image from multiple levels into patterns that are highly discriminative of inter-personal differences but robust to intra-personal variations. Experimental results on the FERET, CAS-PERL-R1, FRGC 2.0, and LFW databases indicate that DCP outperforms the state-of-the-art local descriptors (e.g., LBP, LTP, LPQ, POEM, tLBP, and LGXP) for both face identification and face verification tasks. More impressively, the best performance is achieved on the challenging LFW and FRGC 2.0 databases by deploying MDML-DCPs in a simple recognition scheme. [ABSTRACT FROM PUBLISHER]
- Published
- 2016
- Full Text
- View/download PDF
709. Screen-based active user authentication.
- Author
-
Fathy, Mohammed E., Patel, Vishal M., Yeh, Tom, Zhang, Yangmuzi, Chellappa, Rama, and Davis, Larry S.
- Subjects
- *
HISTOGRAMS , *OPTICAL flow , *MICE (Computers) , *COMPUTER science , *COMPUTER vision , *AUTOMATION - Abstract
Highlights: [•] We compute from a recording an average of the histograms of oriented optical flows. [•] Four interaction types are considered: MouseMoving, Typing, Scrolling and Other. [•] It is possible to accurately determine the type of interaction in a screen recording. [•] Not all interactions types result in recordings that are good at verifying identity. [•] The scrolling interaction can verify identity with moderately low FAR/FRR. [Copyright &y& Elsevier]
- Published
- 2014
- Full Text
- View/download PDF
710. A data-driven detection optimization framework
- Author
-
Robson Schwartz, William, Hugo Cunha de Melo, Victor, Pedrini, Helio, and Davis, Larry S.
- Subjects
- *
REGRESSION analysis , *COMPUTER science , *MATHEMATICAL optimization , *STATISTICAL decision making , *SIGNAL detection , *ACCURACY - Abstract
Abstract: Due to the large amount of data to be processed by visual applications aiming at extracting high-level understanding of the scene, low-level methods such as object detection are required to have not only high accuracy but also low computational cost in order to provide fast and reliable information. Training sets containing samples representing multiple scenes are used to learn object detectors that can be reliably used in different scenarios. In general, information extracted from multiple feature channels is combined to capture the large variability present in these different environments. Although this approach provides accurate detection results, it usually leads to a high computational cost. On the other hand, if characteristics of the scene are known before-hand, a set of simple and fast computing features might be sufficient to provide high accuracy at a low computational cost. Therefore, it is valuable to seek a balance between these two extremes such that the detection method not only works well in different scenarios but also is able to extract enough information from a scene. We integrate a set of data-driven regression models with a multi-stage based human detection method trained to be used in different environments. The regressions are used to estimate the detector response at each stage and the location of the objects. The use of the regression models allows the method to reject large number of detection windows quickly. Experimental results based on human detection show that the addition of the regression models reduces the computational cost by as much as ten times with very small or no degradation on detection accuracy. [Copyright &y& Elsevier]
- Published
- 2013
- Full Text
- View/download PDF
711. Contour-based motion estimation
- Author
-
Davis, Larry S., Wu, Zhongquan, and Sun, Hanfang
- Published
- 1982
- Full Text
- View/download PDF
712. Building an Open-Vocabulary Video CLIP Model With Better Architectures, Optimization and Data.
- Author
-
Wu Z, Weng Z, Peng W, Yang X, Li A, Davis LS, and Jiang YG
- Abstract
Despite significant results achieved by Contrastive Language-Image Pretraining (CLIP) in zero-shot image recognition, limited effort has been made exploring its potential for zero-shot video recognition. This paper presents Open-VCLIP++, a simple yet effective framework that adapts CLIP to a strong zero-shot video classifier, capable of identifying novel actions and events during testing. Open-VCLIP++ minimally modifies CLIP to capture spatial-temporal relationships in videos, thereby creating a specialized video classifier while striving for generalization. We formally demonstrate that training Open-VCLIP++ is tantamount to continual learning with zero historical data. To address this problem, we introduce Interpolated Weight Optimization, a technique that leverages the advantages of weight interpolation during both training and testing. Furthermore, we build upon large language models to produce fine-grained video descriptions. These detailed descriptions are further aligned with video features, facilitating a better transfer of CLIP to the video domain. Our approach is evaluated on three widely used action recognition datasets, following a variety of zero-shot evaluation protocols. The results demonstrate that our method surpasses existing state-of-the-art techniques by significant margins. Specifically, we achieve zero-shot accuracy scores of 88.1%, 58.7%, and 81.2% on UCF, HMDB, and Kinetics-600 datasets respectively, outpacing the best-performing alternative methods by 8.5%, 8.2%, and 12.3%. We also evaluate our approach on the MSR-VTT video-text retrieval dataset, where it delivers competitive video-to-text and text-to-video retrieval performance, while utilizing substantially less fine-tuning data compared to other methods.
- Published
- 2024
- Full Text
- View/download PDF
713. Towards Transferable Adversarial Attacks on Image and Video Transformers.
- Author
-
Wei Z, Chen J, Goldblum M, Wu Z, Goldstein T, Jiang YG, and Davis LS
- Abstract
The transferability of adversarial examples across different convolutional neural networks (CNNs) makes it feasible to perform black-box attacks, resulting in security threats for CNNs. However, fewer endeavors have been made to investigate transferable attacks for vision transformers (ViTs), which achieve superior performance on various computer vision tasks. Unlike CNNs, ViTs establish relationships between patches extracted from inputs by the self-attention module. Thus, adversarial examples crafted on CNNs might hardly attack ViTs. To assess the security of ViTs comprehensively, we investigate the transferability across different ViTs in both untargetd and targeted scenarios. More specifically, we propose a Pay No Attention (PNA) attack, which ignores attention gradients during backpropagation to improve the linearity of backpropagation. Additionally, we introduce a PatchOut/CubeOut attack for image/video ViTs. They optimize perturbations within a randomly selected subset of patches/cubes during each iteration, preventing over-fitting to the white-box surrogate ViT model. Furthermore, we maximize the L
2 norm of perturbations, ensuring that the generated adversarial examples deviate significantly from the benign ones. These strategies are designed to be harmoniously compatible. Combining them can enhance transferability by jointly considering patch-based inputs and the self-attention of ViTs. Moreover, the proposed combined attack seamlessly integrates with existing transferable attacks, providing an additional boost to transferability. We conduct experiments on ImageNet and Kinetics-400 for image and video ViTs, respectively. Experimental results demonstrate the effectiveness of the proposed method.- Published
- 2023
- Full Text
- View/download PDF
714. A Generic Improvement to Deep Residual Networks Based on Gradient Flow.
- Author
-
Santhanam V and Davis LS
- Abstract
Preactivation ResNets consistently outperforms the original postactivation ResNets on the CIFAR10/100 classification benchmark. However, these results surprisingly do not carry over to the standard ImageNet benchmark. First, we theoretically analyze this incongruity in terms of how the two variants differ in handling the propagation of gradients. Although identity shortcuts are critical in both variants for improving optimization and performance, we show that postactivation variants enable early layers to receive a diverse dynamic composition of gradients from effectively deeper paths in comparison to preactivation variants, enabling the network to make maximal use of its representational capacity. Second, we show that downsampling projections (while only a few in number) have a significantly detrimental effect on performance. We show that by simply replacing downsampling projections with identitylike dense-reshape shortcuts, the classification results of standard residual architectures such as ResNets, ResNeXts, and SE-Nets improve by up to 1.2% on ImageNet, without any increase in computational complexity (FLOPs).
- Published
- 2020
- Full Text
- View/download PDF
715. Recognizing human actions by learning and matching shape-motion prototype trees.
- Author
-
Jiang Z, Lin Z, and Davis LS
- Subjects
- Humans, Image Interpretation, Computer-Assisted methods, Imaging, Three-Dimensional methods, Algorithms, Pattern Recognition, Automated methods
- Abstract
A shape-motion prototype-based approach is introduced for action recognition. The approach represents an action as a sequence of prototypes for efficient and flexible action matching in long video sequences. During training, an action prototype tree is learned in a joint shape and motion space via hierarchical K-means clustering and each training sequence is represented as a labeled prototype sequence; then a look-up table of prototype-to-prototype distances is generated. During testing, based on a joint probability model of the actor location and action prototype, the actor is tracked while a frame-to-prototype correspondence is established by maximizing the joint probability, which is efficiently performed by searching the learned prototype tree; then actions are recognized using dynamic prototype sequence matching. Distance measures used for sequence matching are rapidly obtained by look-up table indexing, which is an order of magnitude faster than brute-force computation of frame-to-frame distances. Our approach enables robust action matching in challenging situations (such as moving cameras, dynamic backgrounds) and allows automatic alignment of action sequences. Experimental results demonstrate that our approach achieves recognition rates of 92.86 percent on a large gesture data set (with dynamic backgrounds), 100 percent on the Weizmann action data set, 95.77 percent on the KTH action data set, 88 percent on the UCF sports data set, and 87.27 percent on the CMU action data set.
- Published
- 2012
- Full Text
- View/download PDF
716. Shape-based human detection and segmentation via hierarchical part-template matching.
- Author
-
Lin Z and Davis LS
- Subjects
- Artificial Intelligence, Crowding, Decision Trees, Humans, Algorithms, Image Processing, Computer-Assisted methods, Pattern Recognition, Automated methods, Posture physiology
- Abstract
We propose a shape-based, hierarchical part-template matching approach to simultaneous human detection and segmentation combining local part-based and global shape-template-based schemes. The approach relies on the key idea of matching a part-template tree to images hierarchically to detect humans and estimate their poses. For learning a generic human detector, a pose-adaptive feature computation scheme is developed based on a tree matching approach. Instead of traditional concatenation-style image location-based feature encoding, we extract features adaptively in the context of human poses and train a kernel-SVM classifier to separate human/nonhuman patterns. Specifically, the features are collected in the local context of poses by tracing around the estimated shape boundaries. We also introduce an approach to multiple occluded human detection and segmentation based on an iterative occlusion compensation scheme. The output of our learned generic human detector can be used as an initial set of human hypotheses for the iterative optimization. We evaluate our approaches on three public pedestrian data sets (INRIA, MIT-CBCL, and USC-B) and two crowded sequences from Caviar Benchmark and Munich Airport data sets.
- Published
- 2010
- Full Text
- View/download PDF
717. Visual tracking by continuous density propagation in sequential bayesian filtering framework.
- Author
-
Han B, Zhu Y, Comaniciu D, and Davis LS
- Subjects
- Bayes Theorem, Image Enhancement methods, Reproducibility of Results, Sensitivity and Specificity, Algorithms, Artificial Intelligence, Image Interpretation, Computer-Assisted methods, Pattern Recognition, Automated methods, Signal Processing, Computer-Assisted, Subtraction Technique
- Abstract
Particle filtering is frequently used for visual tracking problems since it provides a general framework for estimating and propagating probability density functions for nonlinear and non-Gaussian dynamic systems. However, this algorithm is based on a Monte Carlo approach and the cost of sampling and measurement is a problematic issue, especially for high-dimensional problems. We describe an alternative to the classical particle filter in which the underlying density function has an analytic representation for better approximation and effective propagation. The techniques of density interpolation and density approximation are introduced to represent the likelihood and the posterior densities with Gaussian mixtures, where all relevant parameters are automatically determined. The proposed analytic approach is shown to perform more efficiently in sampling in high-dimensional space. We apply the algorithm to real-time tracking problems and demonstrate its performance on real video sequences as well as synthetic examples.
- Published
- 2009
- Full Text
- View/download PDF
718. Sequential kernel density approximation and its application to real-time visual tracking.
- Author
-
Han B, Comaniciu D, Zhu Y, and Davis LS
- Subjects
- Computer Simulation, Computer Systems, Image Enhancement methods, Models, Statistical, Motion, Reproducibility of Results, Sensitivity and Specificity, Statistical Distributions, Algorithms, Artificial Intelligence, Image Interpretation, Computer-Assisted methods, Pattern Recognition, Automated methods, Subtraction Technique
- Abstract
Visual features are commonly modeled with probability density functions in computer vision problems, but current methods such as a mixture of Gaussians and kernel density estimation suffer from either the lack of flexibility, by fixing or limiting the number of Gaussian components in the mixture, or large memory requirement, by maintaining a non-parametric representation of the density. These problems are aggravated in real-time computer vision applications since density functions are required to be updated as new data becomes available. We present a novel kernel density approximation technique based on the mean-shift mode finding algorithm, and describe an efficient method to sequentially propagate the density modes over time. While the proposed density representation is memory efficient, which is typical for mixture densities, it inherits the flexibility of non-parametric methods by allowing the number of components to be variable. The accuracy and compactness of the sequential kernel density approximation technique is illustrated by both simulations and experiments. Sequential kernel density approximation is applied to on-line target appearance modeling for visual tracking, and its performance is demonstrated on a variety of videos.
- Published
- 2008
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.