36 results on '"Carlsson, Stefan"'
Search Results
2. Self-tuned Visual Subclass Learning with Shared Samples An Incremental Approach
- Author
-
Azizpour, Hossein and Carlsson, Stefan
- Subjects
FOS: Computer and information sciences ,Datorsystem ,Computer Systems ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Computer vision tasks are traditionally defined and evaluated using semantic categories. However, it is known to the field that semantic classes do not necessarily correspond to a unique visual class (e.g. inside and outside of a car). Furthermore, many of the feasible learning techniques at hand cannot model a visual class which appears consistent to the human eye. These problems have motivated the use of 1) Unsupervised or supervised clustering as a preprocessing step to identify the visual subclasses to be used in a mixture-of-experts learning regime. 2) Felzenszwalb et al. part model and other works model mixture assignment with latent variables which is optimized during learning 3) Highly non-linear classifiers which are inherently capable of modelling multi-modal input space but are inefficient at the test time. In this work, we promote an incremental view over the recognition of semantic classes with varied appearances. We propose an optimization technique which incrementally finds maximal visual subclasses in a regularized risk minimization framework. Our proposed approach unifies the clustering and classification steps in a single algorithm. The importance of this approach is its compliance with the classification via the fact that it does not need to know about the number of clusters, the representation and similarity measures used in pre-processing clustering methods a priori. Following this approach we show both qualitatively and quantitatively significant results. We show that the visual subclasses demonstrate a long tail distribution. Finally, we show that state of the art object detection methods (e.g. DPM) are unable to use the tails of this distribution comprising 50\% of the training samples. In fact we show that DPM performance slightly increases on average by the removal of this half of the data., Comment: Updated ICCV 2013 submission
- Published
- 2013
3. From Generic to Specific Deep Representations for Visual Recognition
- Author
-
Azizpour, Hossein, Razavian, Ali Sharif, Sullivan, Josephine, Maki, Atsuto, Carlsson, Stefan, Azizpour, Hossein, Razavian, Ali Sharif, Sullivan, Josephine, Maki, Atsuto, and Carlsson, Stefan
- Abstract
Evidence is mounting that ConvNets are the best representation learning method for recognition. In the common scenario, a ConvNet is trained on a large labeled dataset and the feed-forward units activation, at a certain layer of the network, is used as a generic representation of an input image. Recent studies have shown this form of representation to be astoundingly effective for a wide range of recognition tasks. This paper thoroughly investigates the transferability of such representations w.r.t. several factors. It includes parameters for training the network such as its architecture and parameters of feature extraction. We further show that different visual recognition tasks can be categorically ordered based on their distance from the source task. We then show interesting results indicating a clear correlation between the performance of tasks and their distance from the source task conditioned on proposed factors. Furthermore, by optimizing these factors, we achieve stateof-the-art performances on 16 visual recognition tasks., QC 20150507. QC 20200701
- Published
- 2015
- Full Text
- View/download PDF
4. Visual instance retrieval with deep convolutional networks
- Author
-
Razavian, Ali Sharif, Sullivan, Josephine, Carlsson, Stefan, Maki, Atsuto, Razavian, Ali Sharif, Sullivan, Josephine, Carlsson, Stefan, and Maki, Atsuto
- Abstract
This paper provides an extensive study on the availability of image representations based on convolutional networks (ConvNets) for the task of visual instance retrieval. Besides the choice of convolutional layers, we present an efficient pipeline exploiting multi-scale schemes to extract local features, in particular, by taking geometric invariance into explicit account, i.e. positions, scales and spatial consistency. In our experiments using five standard image retrieval datasets, we demonstrate that generic ConvNet image representations can outperform other state-of-the-art methods if they are extracted appropriately., QC 20200616
- Published
- 2015
5. Spotlight the Negatives : A Generalized Discriminative Latent Model
- Author
-
Azizpour, Hossein, Arefiyan, Mostafa, Naderi Parizi, Sobhan, Carlsson, Stefan, Azizpour, Hossein, Arefiyan, Mostafa, Naderi Parizi, Sobhan, and Carlsson, Stefan
- Abstract
Discriminative latent variable models (LVM) are frequently applied to various visualrecognition tasks. In these systems the latent (hidden) variables provide a formalism formodeling structured variation of visual features. Conventionally, latent variables are de-fined on the variation of the foreground (positive) class. In this work we augment LVMsto includenegativelatent variables corresponding to the background class. We formalizethe scoring function of such a generalized LVM (GLVM). Then we discuss a frameworkfor learning a model based on the GLVM scoring function. We theoretically showcasehow some of the current visual recognition methods can benefit from this generalization.Finally, we experiment on a generalized form of Deformable Part Models with negativelatent variables and show significant improvements on two different detection tasks., QC 20150828
- Published
- 2015
6. Persistent Evidence of Local Image Properties in Generic ConvNets
- Author
-
Sharif Razavian, Ali, Azizpour, Hossein, Maki, Atsuto, Sullivan, Josephine, Ek, Carl Henrik, Carlsson, Stefan, Sharif Razavian, Ali, Azizpour, Hossein, Maki, Atsuto, Sullivan, Josephine, Ek, Carl Henrik, and Carlsson, Stefan
- Abstract
Supervised training of a convolutional network for object classification should make explicit any information related to the class of objects and disregard any auxiliary information associated with the capture of the image or thevariation within the object class. Does this happen in practice? Although this seems to pertain to the very final layers in the network, if we look at earlier layers we find that this is not the case. Surprisingly, strong spatial information is implicit. This paper addresses this, in particular, exploiting the image representation at the first fully connected layer,i.e. the global image descriptor which has been recently shown to be most effective in a range of visual recognition tasks. We empirically demonstrate evidences for the finding in the contexts of four different tasks: 2d landmark detection, 2d object keypoints prediction, estimation of the RGB values of input image, and recovery of semantic label of each pixel. We base our investigation on a simple framework with ridge rigression commonly across these tasks,and show results which all support our insight. Such spatial information can be used for computing correspondence of landmarks to a good accuracy, but should potentially be useful for improving the training of the convolutional nets for classification purposes., Qc 20150828
- Published
- 2015
- Full Text
- View/download PDF
7. CNN features off-the-shelf : An Astounding Baseline for Recognition
- Author
-
Sharif Razavian, Ali, Azizpour, Hossein, Sullivan, Josephine, Carlsson, Stefan, Sharif Razavian, Ali, Azizpour, Hossein, Sullivan, Josephine, and Carlsson, Stefan
- Abstract
Recent results indicate that the generic descriptors extracted from the convolutional neural networks are very powerful. This paper adds to the mounting evidence that this is indeed the case. We report on a series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13. We use features extracted from the OverFeat network as a generic image representation to tackle the diverse range of recognition tasks of object image classification, scene recognition, fine grained recognition, attribute detection and image retrieval applied to a diverse set of datasets. We selected these tasks and datasets as they gradually move further away from the original task and data the OverFeat network was trained to solve. Astonishingly, we report consistent superior results compared to the highly tuned state-of-the-art systems in all the visual classification tasks on various datasets. For instance retrieval it consistently outperforms low memory footprint methods except for sculptures dataset. The results are achieved using a linear SVM classifier (or L2 distance in case of retrieval) applied to a feature representation of size 4096 extracted from a layer in the net. The representations are further modified using simple augmentation techniques e.g. jittering. The results strongly suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual recognition tasks., Best Paper Runner-up Award.QC 20140825
- Published
- 2014
- Full Text
- View/download PDF
8. Initialization framework for latent variable models
- Author
-
Afkham, Heydar Maboudi, Ek, Carl Henrik, Carlsson, Stefan, Afkham, Heydar Maboudi, Ek, Carl Henrik, and Carlsson, Stefan
- Abstract
In this paper, we discuss the properties of a class of latent variable models that assumes each labeled sample is associated with set of different features, with no prior knowledge of which feature is the most relevant feature to be used. Deformable-Part Models (DPM) can be seen as good example of such models. While Latent SVM framework (LSVM) has proven to be an efficient tool for solving these models, we will argue that the solution found by this tool is very sensitive to the initialization. To decrease this dependency, we propose a novel clustering procedure, for these problems, to find cluster centers that are shared by several sample sets while ignoring the rest of the cluster centers. As we will show, these cluster centers will provide a robust initialization for the LSVM framework., QC 20150610
- Published
- 2014
- Full Text
- View/download PDF
9. A topological framework for training latent variable models
- Author
-
Afkham, Heydar Maboudi, Ek, Carl Henrik, Carlsson, Stefan, Afkham, Heydar Maboudi, Ek, Carl Henrik, and Carlsson, Stefan
- Abstract
We discuss the properties of a class of latent variable models that assumes each labeled sample is associated with a set of different features, with no prior knowledge of which feature is the most relevant feature to be used. Deformable-Part Models (DPM) can be seen as good examples of such models. These models are usually considered to be expensive to train and very sensitive to the initialization. In this paper, we focus on the learning of such models by introducing a topological framework and show how it is possible to both reduce the learning complexity and produce more robust decision boundaries. We will also argue how our framework can be used for producing robust decision boundaries without exploiting the dataset bias or relying on accurate annotations. To experimentally evaluate our method and compare with previously published frameworks, we focus on the problem of image classification with object localization. In this problem, the correct location of the objects is unknown, during both training and testing stages, and is considered as a latent variable., QC 20150605
- Published
- 2014
- Full Text
- View/download PDF
10. Estimating Attention in Exhibitions Using Wearable Cameras
- Author
-
Sharif Razavian, Ali, Aghazadeh, Omid, Sullivan, Josephine, Carlsson, Stefan, Sharif Razavian, Ali, Aghazadeh, Omid, Sullivan, Josephine, and Carlsson, Stefan
- Abstract
This paper demonstrates a system for automatic detection of visual attention and identification of salient items at exhibitions (e.g. museum or an auction). The method is offline and is done on a video captured by a head mounted camera. Towards the estimation of attention, we define the notions of "saliency" and "interestingness" for an exhibition items. Our method is a combination of multiple state of the art techniques from different vision tasks such as tracking, image matching and retrieval. Many experiments are conducted to evaluate multiple aspects of our method. The method has proven to be robust to image blur, occlusion, truncation, and dimness. The experiments shows strong performance for the tasks of matching items, estimating focus frames and detecting salient and interesting items. This can be useful to the commercial vendors and museum curators and help them to understand which items are appealing more to the visitors., QC 20150521
- Published
- 2014
- Full Text
- View/download PDF
11. Gradual improvement of image descriptor quality
- Author
-
Afkham, Heydar Maboudi, Ek, Carl Henrik, Carlsson, Stefan, Afkham, Heydar Maboudi, Ek, Carl Henrik, and Carlsson, Stefan
- Abstract
In this paper, we propose a framework for gradually improving the quality of an already existing image descriptor. The descriptor used in this paper (Afkham et al., 2013) uses the response of a series of discriminative components for summarizing each image. As we will show, this descriptor has an ideal form in which all categories become linearly separable. While, reaching this form is not feasible, we will argue how by replacing a small fraction of these components, it is possible to obtain a descriptor which is, on average, closer to this ideal form. To do so, we initially identify which components do not contribute to the quality of the descriptor and replace them with more robust components. Here, a joint feature selection method is used to find improved components. As our experiments show, this change directly reflects in the capability of the resulting descriptor in discriminating between different categories., QC 20150615
- Published
- 2014
- Full Text
- View/download PDF
12. Qualitative vocabulary based descriptor
- Author
-
Maboudi Afkham, Heydar, Ek, Carl Henrik, Carlsson, Stefan, Maboudi Afkham, Heydar, Ek, Carl Henrik, and Carlsson, Stefan
- Abstract
Creating a single feature descriptors from a collection of feature responses is an often occurring task. As such the bag-of-words descriptors have been very successful and applied to data from a large range of different domains. Central to this approach is making an association of features to words. In this paper we present a new and novel approach to feature to word association problem. The proposed method creates a more robust representation when data is noisy and requires less words compared to the traditional methods while retaining similar performance. We experimentally evaluate the method on a challenging image classification data-set and show significant improvement to the state of the art., QC 20131204
- Published
- 2013
13. Extracting essential local object characteristics for 3D object categorization
- Author
-
Madry, Marianna, Maboudi Afkham, Heydar, Ek, Carl Henrik, Carlsson, Stefan, Kragic, Danica, Madry, Marianna, Maboudi Afkham, Heydar, Ek, Carl Henrik, Carlsson, Stefan, and Kragic, Danica
- Abstract
Most object classes share a considerable amount of local appearance and often only a small number of features are discriminative. The traditional approach to represent an object is based on a summarization of the local characteristics by counting the number of feature occurrences. In this paper we propose the use of a recently developed technique for summarizations that, rather than looking into the quantity of features, encodes their quality to learn a description of an object. Our approach is based on extracting and aggregating only the essential characteristics of an object class for a task. We show how the proposed method significantly improves on previous work in 3D object categorization. We discuss the benefits of the method in other scenarios such as robot grasping. We provide extensive quantitative and qualitative experiments comparing our approach to the state of the art to justify the described approach., QC 20131216
- Published
- 2013
- Full Text
- View/download PDF
14. Properties of Datasets Predict the Performance of Classifiers
- Author
-
Aghazadeh, Omid, Carlsson, Stefan, Aghazadeh, Omid, and Carlsson, Stefan
- Abstract
QS 2014
- Published
- 2013
15. 3D pictorial structures for multiple view articulated pose estimation
- Author
-
Burenius, Magnus, Sullivan, Josephine, Carlsson, Stefan, Burenius, Magnus, Sullivan, Josephine, and Carlsson, Stefan
- Abstract
We consider the problem of automatically estimating the 3D pose of humans from images, taken from multiple calibrated views. We show that it is possible and tractable to extend the pictorial structures framework, popular for 2D pose estimation, to 3D. We discuss how to use this framework to impose view, skeleton, joint angle and intersection constraints in 3D. The 3D pictorial structures are evaluated on multiple view data from a professional football game. The evaluation is focused on computational tractability, but we also demonstrate how a simple 2D part detector can be plugged into the framework., QC 20131007
- Published
- 2013
- Full Text
- View/download PDF
16. Multispectral MRI segmentation of age related white matter changes using a cascade of support vector machines
- Author
-
Damangir, Soheil, Manzouri, Amirhossein, Oppedal, Ketil, Carlsson, Stefan, Firbank, Michael J., Sonnesyn, Hogne, Tysnes, Ole-Bjorn, O'Brien, John T., Beyer, Mona K., Westman, Eric, Aarsland, Dag, Wahlund, Lars-Olof, Spulber, Gabriela, Damangir, Soheil, Manzouri, Amirhossein, Oppedal, Ketil, Carlsson, Stefan, Firbank, Michael J., Sonnesyn, Hogne, Tysnes, Ole-Bjorn, O'Brien, John T., Beyer, Mona K., Westman, Eric, Aarsland, Dag, Wahlund, Lars-Olof, and Spulber, Gabriela
- Abstract
White matter changes (WMC) are the focus of intensive research and have been linked to cognitive impairment and depression in the elderly. Cumbersome manual outlining procedures make research on WMC labor intensive and prone to subjective bias. We present a fast, fully automated method for WMC segmentation using a cascade of reduced support vector machines (SVMs) with active learning. Data of 102 subjects was used in this study. Two MRI sequences (T1-weighted and FLAIR) and masks of manually outlined WMC from each subject were used for the image analysis. The segmentation framework comprises pre-processing, classification (training and core segmentation) and post-processing. After pre-processing, the model was trained on two subjects and tested on the remaining 100 subjects. The effectiveness and robustness of the classification was assessed using the receiver operating curve technique. The cascade of SVMs segmentation framework outputted accurate results with high sensitivity (90%) and specificity (99.5%) values, with the manually outlined WMC as reference. An algorithm for the segmentation of WMC is proposed. This is a completely competitive and fast automatic segmentation framework, capable of using different input sequences, without changes or restrictions of the image analysis algorithm., QC 20121217
- Published
- 2012
- Full Text
- View/download PDF
17. Multi view registration for novelty/background separation
- Author
-
Aghazadeh, Omid, Sullivan, Josephine, Carlsson, Stefan, Aghazadeh, Omid, Sullivan, Josephine, and Carlsson, Stefan
- Abstract
We propose a system for the automatic segmentation of novelties from the background in scenarios where multiple images of the same environment are available e.g. obtained by wearable visual cameras. Our method finds the pixels in a query image corresponding to the underlying background environment by comparing it to reference images of the same scene. This is achieved despite the fact that all the images may have different viewpoints, significantly different illumination conditions and contain different objects cars, people, bicycles, etc. occluding the background. We estimate the probability of each pixel, in the query image, belonging to the background by computing its appearance inconsistency to the multiple reference images. We then, produce multiple segmentations of the query image using an iterated graph cuts algorithm, initializing from these estimated probabilities and consecutively combine these segmentations to come up with a final segmentation of the background. Detection of the background in turn highlights the novel pixels. We demonstrate the effectiveness of our approach on a challenging outdoors data set., QC 20121121
- Published
- 2012
- Full Text
- View/download PDF
18. Mixture component identification and learning for visual recognition
- Author
-
Aghazadeh, Omid, Azizpour, Hossein, Sullivan, Josephine, Carlsson, Stefan, Aghazadeh, Omid, Azizpour, Hossein, Sullivan, Josephine, and Carlsson, Stefan
- Abstract
The non-linear decision boundary between object and background classes - due to large intra-class variations - needs to be modelled by any classifier wishing to achieve good results. While a mixture of linear classifiers is capable of modelling this non-linearity, learning this mixture from weakly annotated data is non-trivial and is the paper's focus. Our approach is to identify the modes in the distribution of our positive examples by clustering, and to utilize this clustering in a latent SVM formulation to learn the mixture model. The clustering relies on a robust measure of visual similarity which suppresses uninformative clutter by using a novel representation based on the exemplar SVM. This subtle clustering of the data leads to learning better mixture models, as is demonstrated via extensive evaluations on Pascal VOC 2007. The final classifier, using a HOG representation of the global image patch, achieves performance comparable to the state-of-the-art while being more efficient at detection time., QC 20121207
- Published
- 2012
- Full Text
- View/download PDF
19. Improving feature level likelihoods using cloud features
- Author
-
Maboudi Afkham, Heydar, Carlsson, Stefan, Sullivan, Josephine, Maboudi Afkham, Heydar, Carlsson, Stefan, and Sullivan, Josephine
- Abstract
The performance of many computer vision methods depends on the quality of the local features extracted from the images. For most methods the local features are extracted independently of the task and they remain constant through the whole process. To make features more dynamic and give models a choice in the features they can use, this work introduces a set of intermediate features referred as cloud features. These features take advantage of part-based models at the feature level by combining each extracted local feature with its close by local feature creating a cloud of different representations for each local features. These representations capture the local variations around the local feature. At classification time, the best possible representation is pulled out of the cloud and used in the calculations. This selection is done based on several latent variables encoded within the cloud features. The goal of this paper is to test how the cloud features can improve the feature level likelihoods. The focus of the experiments of this paper is on feature level inference and showing how replacing single features with equivalent cloud features improves the likelihoods obtained from them. The experiments of this paper are conducted on several classes of MSRCv1 dataset., QC 20131003
- Published
- 2012
20. Human 3D Motion Computation from a varying Number of Cameras
- Author
-
Burenius, Magnus, Sullivan, Josephine, Carlsson, Stefan, Halvorsen, Kjartan, Burenius, Magnus, Sullivan, Josephine, Carlsson, Stefan, and Halvorsen, Kjartan
- Abstract
This paper focuses on how the accuracy of marker-less human motion capture is affected by the number of camera views used. Specifically, we compare the 3D reconstructions calculated from single and multiple cameras. We perform our experiments on data consisting of video from multiple cameras synchronized with ground truth 3D motion, obtained from a motion capture session with a professional footballer. The error is compared for the 3D reconstructions, of diverse motions, estimated using the manually located image joint positions from one, two or three cameras. We also present a new bundle adjustment procedure using regression splines to impose weak prior assumptions about human motion, temporal smoothness and joint angle limits, on the 3D reconstruction. The results show that even under close to ideal circumstances the monocular 3D reconstructions contain visual artifacts not present in the multiple view case, indicating accurate and efficient marker-less human motion capture requires multiple cameras., QC 20110930
- Published
- 2011
- Full Text
- View/download PDF
21. Motion Capture from Dynamic Orthographic Cameras
- Author
-
Burenius, Magnus, Sullivan, Josephine, Carlsson, Stefan, Burenius, Magnus, Sullivan, Josephine, and Carlsson, Stefan
- Abstract
We present an extension to the scaled orthographic camera model. It deals with dynamic cameras looking at faraway objects. The camera is allowed to change focal lengthand translate and rotate in 3D. The model we derive saysthat this motion can be treated as scaling, translation androtation in a 2D image plane. It is valid if the camera and itstarget move around in two separate regions that are smallcompared to the distance between them.We show two applications of this model to motion capture applications at large distances, i.e. outside a studio,using the affine factorization algorithm. The model is usedto motivate theoretically why the factorization can be carried out in a single batch step, when having both dynamiccameras and a dynamic object. Furthermore, the model isused to motivate how the position of the object can be reconstructed by measuring the virtual 2D motion of the cameras. For testing we use videos from a real football gameand reconstruct the 3D motion of a footballer as he scoresa goal., QC 20111213
- Published
- 2011
- Full Text
- View/download PDF
22. Generic Object Class Detection using Feature Maps
- Author
-
Danielsson, Oscar, Carlsson, Stefan, Danielsson, Oscar, and Carlsson, Stefan
- Abstract
In this paper we describe an object class model and a detection scheme based on feature maps, i.e. binary images indicating occurrences of various local features. Any type of local feature and any number of features can be used to generate feature maps. The choice of which features to use can thus be adapted to the task at hand, without changing the general framework. An object class is represented by a boosted decision tree classifier (which may be cascaded) based on normalized distances to feature occurrences. The resulting object class model is essentially a linear combination of a set of flexible configurations of the features used. Within this framework we present an efficient detection scheme that uses a hierarchical search strategy. We demonstrate experimentally that this detection scheme yields a significant speedup compared to sliding window search. We evaluate the detection performance on a standard dataset [7], showing state of the art results. Features used in this paper include edges, corners, blobs and interest points., QC 20110830
- Published
- 2011
- Full Text
- View/download PDF
23. Gated Classifiers : Boosting under high intra-class variation
- Author
-
Danielsson, Oscar, Rasolzadeh, Babak, Carlsson, Stefan, Danielsson, Oscar, Rasolzadeh, Babak, and Carlsson, Stefan
- Abstract
In this paper we address the problem of using boosting (e.g. AdaBoost [7]) to classify a target class with significant intra-class variation against a large background class. This situation occurs for example when we want to recognize a visual object class against all other image patches. The boosting algorithm produces a strong classifier, which is a linear combination of weak classifiers. We observe that we often have sets of weak classifiers that individually fire on many examples of the target class but never fire together on those examples (i.e. their outputs are anti-correlated on the target class). Motivated by this observation we suggest a family of derived weak classifiers, termed gated classifiers, that suppress such combinations of weak classifiers. Gated classifiers can be used on top of any original weak learner. We run experiments on two popular datasets, showing that our method reduces the required number of weak classifiers by almost an order of magnitude, which in turn yields faster detectors. We experiment on synthetic data showing that gated classifiers enables more complex distributions to be represented. We hope that gated classifiers will extend the usefulness of boosted classifier cascades [29]., QC 20110830
- Published
- 2011
- Full Text
- View/download PDF
24. Projectable Classifiers for Multi-View Object Class Recognition
- Author
-
Danielsson, Oscar, Carlsson, Stefan, Danielsson, Oscar, and Carlsson, Stefan
- Abstract
We propose a multi-view object class modeling framework based on a simplified camera model and surfels (defined by a location and normal direction in a normalized 3D coordinate system) that mediate coarse correspondences between different views. Weak classifiers are learnt relative to the reference frames provided by the surfels. We describe a weak classifier that uses contour information when its corresponding surfel projects to a contour element in the image and color information when the face of the surfel is visible in the image. We emphasize that these weak classifiers can possibly take many different forms and use many different image features. Weak classifiers are combined using AdaBoost. We evaluate the method on a public dataset [8], showing promising results on categorization, recognition/detection, pose estimation and image synthesis., QC 20111205
- Published
- 2011
- Full Text
- View/download PDF
25. Novelty Detection from an Ego-Centric perspective
- Author
-
Aghazadeh, Omid, Sullivan, Josephine, Carlsson, Stefan, Aghazadeh, Omid, Sullivan, Josephine, and Carlsson, Stefan
- Abstract
This paper demonstrates a system for the automatic extraction of novelty in images captured from a small video camera attached to a subject's chest, replicating his visual perspective, while performing activities which are repeated daily. Novelty is detected when a (sub)sequence cannot be registered to previously stored sequences captured while performing the same daily activity. Sequence registration is performed by measuring appearance and geometric similarity of individual frames and exploiting the invariant temporal order of the activity. Experimental results demonstrate that this is a robust way to detect novelties induced by variations in the wearer's ego-motion such as stopping and talking to a person. This is an essentially new and generic way of automatically extracting information of interest to the camera wearer and can be used as input to a system for life logging or memory support., QC 20111012, VINST
- Published
- 2011
- Full Text
- View/download PDF
26. Generic Object Class Detection using Boosted Configurations of Oriented Edges
- Author
-
Danielsson, Oscar, Carlsson, Stefan, Danielsson, Oscar, and Carlsson, Stefan
- Abstract
In this paper we introduce a new representation for shape-based object class detection. This representation is based on very sparse and slightly flexible configurations of oriented edges. An ensemble of such configurations is learnt in a boosting framework. Each edge configuration can capture some local or global shape property of the target class and the representation is thus not limited to representing and detecting visual classes that have distinctive local structures. The representation is also able to handle significant intra-class variation. The representation allows for very efficient detection and can be learnt automatically from weakly labelled training images of the target class. The main drawback of the method is that, since its inductive bias is rather weak, it needs a comparatively large training set. We evaluate on a standard database [1] and when using a slightly extended training set, our method outperforms state of the art [2] on four out of five classes.
- Published
- 2010
- Full Text
- View/download PDF
27. Automatic Learning and Extraction of Multi-Local Features
- Author
-
Danielsson, Oscar, Carlsson, Stefan, Sullivan, Josephine, Danielsson, Oscar, Carlsson, Stefan, and Sullivan, Josephine
- Abstract
In this paper we introduce a new kind of feature - the multi-local feature, so named as each one is a collection of local features, such as oriented edgels, in a very specific spatial arrangement. A multi-local feature has the ability to capture underlying constant shape properties of exemplars from an object class. Thus it is particularly suited to representing and detecting visual classes that lack distinctive local structures and are mainly defined by their global shape. We present algorithms to automatically learn an ensemble of these features to represent an object class from weakly labelled training images of that class, as well as procedures to detect these features efficiently in novel images. The power of multi-local features is demonstrated by using the ensemble in a simple voting scheme to perform object category detection on a standard database. Despite its simplicity, this scheme yields detection rates matching state-of-the-art object detection systems., QC 20120917
- Published
- 2009
- Full Text
- View/download PDF
28. Object Detection using Multi-Local Feature Manifolds
- Author
-
Danielsson, Oscar, Carlsson, Stefan, Sullivan, Josephine, Danielsson, Oscar, Carlsson, Stefan, and Sullivan, Josephine
- Abstract
Many object categories are better characterized by the shape of their contour than by local appearance properties like texture or color. Multi-local features are designed in order to capture the global discriminative structure of an object while at the same time avoiding the drawbacks with traditional global descriptors such as sensitivity to irrelevant image properties. The specific structure of multi-local features allows us to generate new feature exemplars by linear combinations which effectively increases the set of stored training exemplars. We demonstrate that a multi-local feature is a good "weak detector" of shape-based object categories and that it can accurately estimate the bounding box of objects in an image. Using just a single multi-local feature descriptor we obtain detection results comparable to those of more complex and elaborate systems. It is our opinion that multi-local features have a great potential as generic object descriptors with very interesting possibilities of feature sharing within and between classes.
- Published
- 2008
- Full Text
- View/download PDF
29. Exploiting Part-Based Models and Edge Boundaries for Object Detection
- Author
-
Sullivan, Josephine, Danielsson, Oscar, Carlsson, Stefan, Sullivan, Josephine, Danielsson, Oscar, and Carlsson, Stefan
- Abstract
This paper explores how to exploit shape information to perform object class recognition. We use a sparse partbased model to describe object categories defined by shape. The sparseness allows the relative spatial relationship between parts to be described simply. It is possible, with this model, to highlight potential locations of the object and its parts in novel images. Subsequently these areas are examined by a more flexible shape model that measures if the image data provides evidence of the existence of boundary/connecting curves between connected hypothesized parts. From these measurements it is possible to construct a very simple cost function which indicates the presence or absence of the object class. The part-based model is designed to decouple variations due to affine warps and other forms of shape deformations. The latter are modeled probabilistically using conditional probability distributions which describe the linear dependencies between the location of a part and a subset of the other parts. These conditional distributions can then be exploited to search efficiently for the instances of the part model in novel images. Results are reported on experiments performed on the ETHZ shape classes database that features heavily cluttered images and large variations in scale., QC 20120120
- Published
- 2008
- Full Text
- View/download PDF
30. Tracking and labelling of interacting multiple targets
- Author
-
Sullivan, Josephine, Carlsson, Stefan, Sullivan, Josephine, and Carlsson, Stefan
- Abstract
Successful multi-target tracking requires solving two problems - localize the targets and label their identity. An isolated target's identity can be unambiguously preserved from one frame to the next. However, for long sequences of many moving targets, like a football game, grouping scenarios will occur in which identity labellings cannot be maintained reliably by using continuity of motion or appearance. This paper describes bow to match targets' identities despite these interactions. Trajectories of when a target is isolated are found. These trajectories end when targets interact and their labellings cannot be maintained. The interactions (merges and splits) of these trajectories form a graph structure. Appropriate feature vectors summarizing particular qualities of each trajectory are extracted. A clustering procedure based on these feature vectors allows the identities of temporally separated trajectories to be matched. Results are shown from a football match captured by a wide screen system giving a full stationary view of the pitch., QC 20111006
- Published
- 2006
- Full Text
- View/download PDF
31. Maximizing validity in 2D motion analysis
- Author
-
Eriksson, Martin, Carlsson, Stefan, Eriksson, Martin, and Carlsson, Stefan
- Abstract
Classifying and analyzing human motion from a video is relatively common in many areas. Since the motion is carried out in 3D space, the 2D projection provided by a video is somewhat limiting. The question we are investigating in this article is how much information is actually lost when going from 3D to 2D and how this information loss depends on factors, such as viewpoint and tracking errors that inevitably will occur if the 2D sequences are analysed automatically., QC 20111025
- Published
- 2004
- Full Text
- View/download PDF
32. Appearance based qualitative image description for object class recognition
- Author
-
Thureson, Johan, Carlsson, Stefan, Thureson, Johan, and Carlsson, Stefan
- Abstract
The problem of recognizing classes of objects as opposed to special instances requires methods of comparing images that capture the variation within the class while they discriminate against objects outside the class. We present a simple method for image description based on histograms of qualitative shape indexes computed from the combination of triplets of sampled locations and gradient directions in the image. We demonstrate that this method indeed is able to capture variation within classes of objects and we apply it to the problem of recognizing four different, categories from a large database. Using our descriptor on the whole image, containing varying degrees of background clutter, we obtain results for two of the objects that are superior to the best results published so far for this database. By cropping images manually we demonstrate that our method has a potential to handle also the other objects when supplied with an algorithm for searching the image. We argue that our method, based on qualitative image properties, capture the large range of variation that is typically encountered within an object class. This means that our method can be used on substantially larger patches of images than existing methods based on simpler criteria for evaluating image similarity., QC 20111024
- Published
- 2004
- Full Text
- View/download PDF
33. Monocular reconstruction of human motion by qualitative selection
- Author
-
Eriksson, Martin, Carlsson, Stefan, Eriksson, Martin, and Carlsson, Stefan
- Abstract
One of the main difficulties when reconstructing human motion from monocular video is the depth ambiguity. Achieving a reconstruction, given the projection of the joints, can be regarded as a search-problem, where the objective is to find the most likely configuration. One inherent problem in such a formulation is the definition of "most likely". In this work we will pick the configuration that best complies with a set of training-data in a qualitative sense. The reason for doing this is to allow for large individual variation within the class of motions, and avoid an extreme bias towards the training-data. In order to capture the qualitative constraints, we have used a set of 3D motion capture data of walking people. The method is tested on orthographic projections of motion capture data, in order to compare the achieved reconstruction with the original motion., QC 20111026
- Published
- 2004
- Full Text
- View/download PDF
34. Monocular 3D reconstruction of human motion in long action sequences
- Author
-
Loy, Gareth, Eriksson, Martin, Sullivan, Josephine, Carlsson, Stefan, Loy, Gareth, Eriksson, Martin, Sullivan, Josephine, and Carlsson, Stefan
- Abstract
A novel algorithm is presented for the 3D reconstruction of human action in long (> 30 second) monocular image sequences. A sequence is represented by a small set of automatically found representative keyframes. The skeletal joint positions are manually located in each keyframe and mapped to all other frames in the sequence. For each keyframe a 3D key pose is created, and interpolation between these 3D body poses, together with the incorporation of limb length and symmetry constraints, provides a smooth initial approximation of the 3D motion. This is then fitted to the image data to generate a realistic 3D reconstruction. The degree of manual input required is controlled by the diversity of the sequence's content. Sports' footage is ideally suited to this approach as it frequently contains a limited number of repeated actions. Our method is demonstrated on a long (36 second) sequence of a woman playing tennis filmed with a non-stationary camera. This sequence required manual initialisation on < 1.5% of the frames, and demonstrates that the system can deal with very rapid motion, severe self-occlusions, motion blur and clutter occurring over several concurrent frames. The monocular 3D reconstruction is verified by synthesising a view from the perspective of a 'ground truth' reference camera, and the result is seen to provide a qualitatively accurate 3D reconstruction of the motion., QC 20111019
- Published
- 2004
- Full Text
- View/download PDF
35. Method and device for generating wide image sequences
- Author
-
Carlsson, Stefan, Hayman, Eric, Sullivan, Josephine, Carlsson, Stefan, Hayman, Eric, and Sullivan, Josephine
- Abstract
The invention relates to a video recording apparatus comprising: a microprocessor (130), a memory means (120) for storing program for generating a set of calibration parameters related to a device having at least two video cameras which are arranged in a predetermined relationship to each other, said parameters being unique for the at least two cameras and their current location as related to the object being recorded; said memory means (120) also storing program for recording of wide image video sequences; read and write memory means (140) for storing data relating to recorded video sequences from at least two video cameras; input means (300) for input of manual input of parameters, input of recorded video sequences, output means (300) for output of a wide image video sequence. The invention also relates to a method for generating a wide image video sequence, said method comprising the steps of generating a set of calibration parameters related to a device having at least two video cameras which are arranged in a predetermined relationship to each other, said parameters being unique for the at least two cameras and their current location as related to the object being recorded; recording synchronously video sequences using each of said at least two video cameras, and generating a wide image video sequence from each of said synchronously recorded video sequences., QC 20120210. QC 20130806
- Published
- 2004
36. Large Scale, Large Margin Classification using Indefinite Similarity Measurens
- Author
-
Aghazadeh, Omid, Carlsson, Stefan, Aghazadeh, Omid, and Carlsson, Stefan
- Abstract
QS 2014
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.