Author: "Carlsson, Stefan" / Publisher: kth, datorseende och robotik, cvap - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Carlsson, Stefan"' showing total 36 results

Start Over Author "Carlsson, Stefan" Publisher kth, datorseende och robotik, cvap

36 results on '"Carlsson, Stefan"'

1. A Baseline for Visual Instance Retrieval with Deep Convolutional Networks

Author: Sharif Razavian, Ali, Sullivan, Josephine, Maki, Atsuto, and Carlsson, Stefan
Subjects: Datorsystem, Computer Systems
Abstract: QC 20150522
Published: 2015

2. Self-tuned Visual Subclass Learning with Shared Samples An Incremental Approach

Author: Azizpour, Hossein and Carlsson, Stefan
Subjects: FOS: Computer and information sciences, Datorsystem, Computer Systems, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition
Abstract: Computer vision tasks are traditionally defined and evaluated using semantic categories. However, it is known to the field that semantic classes do not necessarily correspond to a unique visual class (e.g. inside and outside of a car). Furthermore, many of the feasible learning techniques at hand cannot model a visual class which appears consistent to the human eye. These problems have motivated the use of 1) Unsupervised or supervised clustering as a preprocessing step to identify the visual subclasses to be used in a mixture-of-experts learning regime. 2) Felzenszwalb et al. part model and other works model mixture assignment with latent variables which is optimized during learning 3) Highly non-linear classifiers which are inherently capable of modelling multi-modal input space but are inefficient at the test time. In this work, we promote an incremental view over the recognition of semantic classes with varied appearances. We propose an optimization technique which incrementally finds maximal visual subclasses in a regularized risk minimization framework. Our proposed approach unifies the clustering and classification steps in a single algorithm. The importance of this approach is its compliance with the classification via the fact that it does not need to know about the number of clusters, the representation and similarity measures used in pre-processing clustering methods a priori. Following this approach we show both qualitatively and quantitatively significant results. We show that the visual subclasses demonstrate a long tail distribution. Finally, we show that state of the art object detection methods (e.g. DPM) are unable to use the tails of this distribution comprising 50\% of the training samples. In fact we show that DPM performance slightly increases on average by the removal of this half of the data., Comment: Updated ICCV 2013 submission
Published: 2013

3. From Generic to Specific Deep Representations for Visual Recognition

Author: Azizpour, Hossein, Razavian, Ali Sharif, Sullivan, Josephine, Maki, Atsuto, Carlsson, Stefan, Azizpour, Hossein, Razavian, Ali Sharif, Sullivan, Josephine, Maki, Atsuto, and Carlsson, Stefan
Abstract: Evidence is mounting that ConvNets are the best representation learning method for recognition. In the common scenario, a ConvNet is trained on a large labeled dataset and the feed-forward units activation, at a certain layer of the network, is used as a generic representation of an input image. Recent studies have shown this form of representation to be astoundingly effective for a wide range of recognition tasks. This paper thoroughly investigates the transferability of such representations w.r.t. several factors. It includes parameters for training the network such as its architecture and parameters of feature extraction. We further show that different visual recognition tasks can be categorically ordered based on their distance from the source task. We then show interesting results indicating a clear correlation between the performance of tasks and their distance from the source task conditioned on proposed factors. Furthermore, by optimizing these factors, we achieve stateof-the-art performances on 16 visual recognition tasks., QC 20150507. QC 20200701
Published: 2015
Full Text: View/download PDF

4. Visual instance retrieval with deep convolutional networks

Author: Razavian, Ali Sharif, Sullivan, Josephine, Carlsson, Stefan, Maki, Atsuto, Razavian, Ali Sharif, Sullivan, Josephine, Carlsson, Stefan, and Maki, Atsuto
Abstract: This paper provides an extensive study on the availability of image representations based on convolutional networks (ConvNets) for the task of visual instance retrieval. Besides the choice of convolutional layers, we present an efficient pipeline exploiting multi-scale schemes to extract local features, in particular, by taking geometric invariance into explicit account, i.e. positions, scales and spatial consistency. In our experiments using five standard image retrieval datasets, we demonstrate that generic ConvNet image representations can outperform other state-of-the-art methods if they are extracted appropriately., QC 20200616
Published: 2015

5. Spotlight the Negatives : A Generalized Discriminative Latent Model

Author: Azizpour, Hossein, Arefiyan, Mostafa, Naderi Parizi, Sobhan, Carlsson, Stefan, Azizpour, Hossein, Arefiyan, Mostafa, Naderi Parizi, Sobhan, and Carlsson, Stefan
Abstract: Discriminative latent variable models (LVM) are frequently applied to various visualrecognition tasks. In these systems the latent (hidden) variables provide a formalism formodeling structured variation of visual features. Conventionally, latent variables are de-fined on the variation of the foreground (positive) class. In this work we augment LVMsto includenegativelatent variables corresponding to the background class. We formalizethe scoring function of such a generalized LVM (GLVM). Then we discuss a frameworkfor learning a model based on the GLVM scoring function. We theoretically showcasehow some of the current visual recognition methods can benefit from this generalization.Finally, we experiment on a generalized form of Deformable Part Models with negativelatent variables and show significant improvements on two different detection tasks., QC 20150828
Published: 2015

6. Persistent Evidence of Local Image Properties in Generic ConvNets

Author: Sharif Razavian, Ali, Azizpour, Hossein, Maki, Atsuto, Sullivan, Josephine, Ek, Carl Henrik, Carlsson, Stefan, Sharif Razavian, Ali, Azizpour, Hossein, Maki, Atsuto, Sullivan, Josephine, Ek, Carl Henrik, and Carlsson, Stefan
Abstract: Supervised training of a convolutional network for object classification should make explicit any information related to the class of objects and disregard any auxiliary information associated with the capture of the image or thevariation within the object class. Does this happen in practice? Although this seems to pertain to the very final layers in the network, if we look at earlier layers we find that this is not the case. Surprisingly, strong spatial information is implicit. This paper addresses this, in particular, exploiting the image representation at the first fully connected layer,i.e. the global image descriptor which has been recently shown to be most effective in a range of visual recognition tasks. We empirically demonstrate evidences for the finding in the contexts of four different tasks: 2d landmark detection, 2d object keypoints prediction, estimation of the RGB values of input image, and recovery of semantic label of each pixel. We base our investigation on a simple framework with ridge rigression commonly across these tasks,and show results which all support our insight. Such spatial information can be used for computing correspondence of landmarks to a good accuracy, but should potentially be useful for improving the training of the convolutional nets for classification purposes., Qc 20150828
Published: 2015
Full Text: View/download PDF

7. CNN features off-the-shelf : An Astounding Baseline for Recognition

Author: Sharif Razavian, Ali, Azizpour, Hossein, Sullivan, Josephine, Carlsson, Stefan, Sharif Razavian, Ali, Azizpour, Hossein, Sullivan, Josephine, and Carlsson, Stefan
Abstract: Recent results indicate that the generic descriptors extracted from the convolutional neural networks are very powerful. This paper adds to the mounting evidence that this is indeed the case. We report on a series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13. We use features extracted from the OverFeat network as a generic image representation to tackle the diverse range of recognition tasks of object image classification, scene recognition, fine grained recognition, attribute detection and image retrieval applied to a diverse set of datasets. We selected these tasks and datasets as they gradually move further away from the original task and data the OverFeat network was trained to solve. Astonishingly, we report consistent superior results compared to the highly tuned state-of-the-art systems in all the visual classification tasks on various datasets. For instance retrieval it consistently outperforms low memory footprint methods except for sculptures dataset. The results are achieved using a linear SVM classifier (or L2 distance in case of retrieval) applied to a feature representation of size 4096 extracted from a layer in the net. The representations are further modified using simple augmentation techniques e.g. jittering. The results strongly suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual recognition tasks., Best Paper Runner-up Award.QC 20140825
Published: 2014
Full Text: View/download PDF

8. Initialization framework for latent variable models

Author: Afkham, Heydar Maboudi, Ek, Carl Henrik, Carlsson, Stefan, Afkham, Heydar Maboudi, Ek, Carl Henrik, and Carlsson, Stefan
Abstract: In this paper, we discuss the properties of a class of latent variable models that assumes each labeled sample is associated with set of different features, with no prior knowledge of which feature is the most relevant feature to be used. Deformable-Part Models (DPM) can be seen as good example of such models. While Latent SVM framework (LSVM) has proven to be an efficient tool for solving these models, we will argue that the solution found by this tool is very sensitive to the initialization. To decrease this dependency, we propose a novel clustering procedure, for these problems, to find cluster centers that are shared by several sample sets while ignoring the rest of the cluster centers. As we will show, these cluster centers will provide a robust initialization for the LSVM framework., QC 20150610
Published: 2014
Full Text: View/download PDF

9. A topological framework for training latent variable models

Author: Afkham, Heydar Maboudi, Ek, Carl Henrik, Carlsson, Stefan, Afkham, Heydar Maboudi, Ek, Carl Henrik, and Carlsson, Stefan
Abstract: We discuss the properties of a class of latent variable models that assumes each labeled sample is associated with a set of different features, with no prior knowledge of which feature is the most relevant feature to be used. Deformable-Part Models (DPM) can be seen as good examples of such models. These models are usually considered to be expensive to train and very sensitive to the initialization. In this paper, we focus on the learning of such models by introducing a topological framework and show how it is possible to both reduce the learning complexity and produce more robust decision boundaries. We will also argue how our framework can be used for producing robust decision boundaries without exploiting the dataset bias or relying on accurate annotations. To experimentally evaluate our method and compare with previously published frameworks, we focus on the problem of image classification with object localization. In this problem, the correct location of the objects is unknown, during both training and testing stages, and is considered as a latent variable., QC 20150605
Published: 2014
Full Text: View/download PDF

10. Estimating Attention in Exhibitions Using Wearable Cameras

Author: Sharif Razavian, Ali, Aghazadeh, Omid, Sullivan, Josephine, Carlsson, Stefan, Sharif Razavian, Ali, Aghazadeh, Omid, Sullivan, Josephine, and Carlsson, Stefan
Abstract: This paper demonstrates a system for automatic detection of visual attention and identification of salient items at exhibitions (e.g. museum or an auction). The method is offline and is done on a video captured by a head mounted camera. Towards the estimation of attention, we define the notions of "saliency" and "interestingness" for an exhibition items. Our method is a combination of multiple state of the art techniques from different vision tasks such as tracking, image matching and retrieval. Many experiments are conducted to evaluate multiple aspects of our method. The method has proven to be robust to image blur, occlusion, truncation, and dimness. The experiments shows strong performance for the tasks of matching items, estimating focus frames and detecting salient and interesting items. This can be useful to the commercial vendors and museum curators and help them to understand which items are appealing more to the visitors., QC 20150521
Published: 2014
Full Text: View/download PDF

11. Gradual improvement of image descriptor quality

Author: Afkham, Heydar Maboudi, Ek, Carl Henrik, Carlsson, Stefan, Afkham, Heydar Maboudi, Ek, Carl Henrik, and Carlsson, Stefan
Abstract: In this paper, we propose a framework for gradually improving the quality of an already existing image descriptor. The descriptor used in this paper (Afkham et al., 2013) uses the response of a series of discriminative components for summarizing each image. As we will show, this descriptor has an ideal form in which all categories become linearly separable. While, reaching this form is not feasible, we will argue how by replacing a small fraction of these components, it is possible to obtain a descriptor which is, on average, closer to this ideal form. To do so, we initially identify which components do not contribute to the quality of the descriptor and replace them with more robust components. Here, a joint feature selection method is used to find improved components. As our experiments show, this change directly reflects in the capability of the resulting descriptor in discriminating between different categories., QC 20150615
Published: 2014
Full Text: View/download PDF

12. Qualitative vocabulary based descriptor

Author: Maboudi Afkham, Heydar, Ek, Carl Henrik, Carlsson, Stefan, Maboudi Afkham, Heydar, Ek, Carl Henrik, and Carlsson, Stefan
Abstract: Creating a single feature descriptors from a collection of feature responses is an often occurring task. As such the bag-of-words descriptors have been very successful and applied to data from a large range of different domains. Central to this approach is making an association of features to words. In this paper we present a new and novel approach to feature to word association problem. The proposed method creates a more robust representation when data is noisy and requires less words compared to the traditional methods while retaining similar performance. We experimentally evaluate the method on a challenging image classification data-set and show significant improvement to the state of the art., QC 20131204
Published: 2013

13. Extracting essential local object characteristics for 3D object categorization

Author: Madry, Marianna, Maboudi Afkham, Heydar, Ek, Carl Henrik, Carlsson, Stefan, Kragic, Danica, Madry, Marianna, Maboudi Afkham, Heydar, Ek, Carl Henrik, Carlsson, Stefan, and Kragic, Danica
Abstract: Most object classes share a considerable amount of local appearance and often only a small number of features are discriminative. The traditional approach to represent an object is based on a summarization of the local characteristics by counting the number of feature occurrences. In this paper we propose the use of a recently developed technique for summarizations that, rather than looking into the quantity of features, encodes their quality to learn a description of an object. Our approach is based on extracting and aggregating only the essential characteristics of an object class for a task. We show how the proposed method significantly improves on previous work in 3D object categorization. We discuss the benefits of the method in other scenarios such as robot grasping. We provide extensive quantitative and qualitative experiments comparing our approach to the state of the art to justify the described approach., QC 20131216
Published: 2013
Full Text: View/download PDF

14. Properties of Datasets Predict the Performance of Classifiers

Author: Aghazadeh, Omid, Carlsson, Stefan, Aghazadeh, Omid, and Carlsson, Stefan
Abstract: QS 2014
Published: 2013

15. 3D pictorial structures for multiple view articulated pose estimation

Author: Burenius, Magnus, Sullivan, Josephine, Carlsson, Stefan, Burenius, Magnus, Sullivan, Josephine, and Carlsson, Stefan
Abstract: We consider the problem of automatically estimating the 3D pose of humans from images, taken from multiple calibrated views. We show that it is possible and tractable to extend the pictorial structures framework, popular for 2D pose estimation, to 3D. We discuss how to use this framework to impose view, skeleton, joint angle and intersection constraints in 3D. The 3D pictorial structures are evaluated on multiple view data from a professional football game. The evaluation is focused on computational tractability, but we also demonstrate how a simple 2D part detector can be plugged into the framework., QC 20131007
Published: 2013
Full Text: View/download PDF

16. Multispectral MRI segmentation of age related white matter changes using a cascade of support vector machines

Author: Damangir, Soheil, Manzouri, Amirhossein, Oppedal, Ketil, Carlsson, Stefan, Firbank, Michael J., Sonnesyn, Hogne, Tysnes, Ole-Bjorn, O'Brien, John T., Beyer, Mona K., Westman, Eric, Aarsland, Dag, Wahlund, Lars-Olof, Spulber, Gabriela, Damangir, Soheil, Manzouri, Amirhossein, Oppedal, Ketil, Carlsson, Stefan, Firbank, Michael J., Sonnesyn, Hogne, Tysnes, Ole-Bjorn, O'Brien, John T., Beyer, Mona K., Westman, Eric, Aarsland, Dag, Wahlund, Lars-Olof, and Spulber, Gabriela
Abstract: White matter changes (WMC) are the focus of intensive research and have been linked to cognitive impairment and depression in the elderly. Cumbersome manual outlining procedures make research on WMC labor intensive and prone to subjective bias. We present a fast, fully automated method for WMC segmentation using a cascade of reduced support vector machines (SVMs) with active learning. Data of 102 subjects was used in this study. Two MRI sequences (T1-weighted and FLAIR) and masks of manually outlined WMC from each subject were used for the image analysis. The segmentation framework comprises pre-processing, classification (training and core segmentation) and post-processing. After pre-processing, the model was trained on two subjects and tested on the remaining 100 subjects. The effectiveness and robustness of the classification was assessed using the receiver operating curve technique. The cascade of SVMs segmentation framework outputted accurate results with high sensitivity (90%) and specificity (99.5%) values, with the manually outlined WMC as reference. An algorithm for the segmentation of WMC is proposed. This is a completely competitive and fast automatic segmentation framework, capable of using different input sequences, without changes or restrictions of the image analysis algorithm., QC 20121217
Published: 2012
Full Text: View/download PDF

17. Multi view registration for novelty/background separation

Author: Aghazadeh, Omid, Sullivan, Josephine, Carlsson, Stefan, Aghazadeh, Omid, Sullivan, Josephine, and Carlsson, Stefan
Abstract: We propose a system for the automatic segmentation of novelties from the background in scenarios where multiple images of the same environment are available e.g. obtained by wearable visual cameras. Our method finds the pixels in a query image corresponding to the underlying background environment by comparing it to reference images of the same scene. This is achieved despite the fact that all the images may have different viewpoints, significantly different illumination conditions and contain different objects cars, people, bicycles, etc. occluding the background. We estimate the probability of each pixel, in the query image, belonging to the background by computing its appearance inconsistency to the multiple reference images. We then, produce multiple segmentations of the query image using an iterated graph cuts algorithm, initializing from these estimated probabilities and consecutively combine these segmentations to come up with a final segmentation of the background. Detection of the background in turn highlights the novel pixels. We demonstrate the effectiveness of our approach on a challenging outdoors data set., QC 20121121
Published: 2012
Full Text: View/download PDF

18. Mixture component identification and learning for visual recognition

Author: Aghazadeh, Omid, Azizpour, Hossein, Sullivan, Josephine, Carlsson, Stefan, Aghazadeh, Omid, Azizpour, Hossein, Sullivan, Josephine, and Carlsson, Stefan
Abstract: The non-linear decision boundary between object and background classes - due to large intra-class variations - needs to be modelled by any classifier wishing to achieve good results. While a mixture of linear classifiers is capable of modelling this non-linearity, learning this mixture from weakly annotated data is non-trivial and is the paper's focus. Our approach is to identify the modes in the distribution of our positive examples by clustering, and to utilize this clustering in a latent SVM formulation to learn the mixture model. The clustering relies on a robust measure of visual similarity which suppresses uninformative clutter by using a novel representation based on the exemplar SVM. This subtle clustering of the data leads to learning better mixture models, as is demonstrated via extensive evaluations on Pascal VOC 2007. The final classifier, using a HOG representation of the global image patch, achieves performance comparable to the state-of-the-art while being more efficient at detection time., QC 20121207
Published: 2012
Full Text: View/download PDF

19. Improving feature level likelihoods using cloud features

Author: Maboudi Afkham, Heydar, Carlsson, Stefan, Sullivan, Josephine, Maboudi Afkham, Heydar, Carlsson, Stefan, and Sullivan, Josephine
Abstract: The performance of many computer vision methods depends on the quality of the local features extracted from the images. For most methods the local features are extracted independently of the task and they remain constant through the whole process. To make features more dynamic and give models a choice in the features they can use, this work introduces a set of intermediate features referred as cloud features. These features take advantage of part-based models at the feature level by combining each extracted local feature with its close by local feature creating a cloud of different representations for each local features. These representations capture the local variations around the local feature. At classification time, the best possible representation is pulled out of the cloud and used in the calculations. This selection is done based on several latent variables encoded within the cloud features. The goal of this paper is to test how the cloud features can improve the feature level likelihoods. The focus of the experiments of this paper is on feature level inference and showing how replacing single features with equivalent cloud features improves the likelihoods obtained from them. The experiments of this paper are conducted on several classes of MSRCv1 dataset., QC 20131003
Published: 2012

20. Human 3D Motion Computation from a varying Number of Cameras

Author: Burenius, Magnus, Sullivan, Josephine, Carlsson, Stefan, Halvorsen, Kjartan, Burenius, Magnus, Sullivan, Josephine, Carlsson, Stefan, and Halvorsen, Kjartan
Abstract: This paper focuses on how the accuracy of marker-less human motion capture is affected by the number of camera views used. Specifically, we compare the 3D reconstructions calculated from single and multiple cameras. We perform our experiments on data consisting of video from multiple cameras synchronized with ground truth 3D motion, obtained from a motion capture session with a professional footballer. The error is compared for the 3D reconstructions, of diverse motions, estimated using the manually located image joint positions from one, two or three cameras. We also present a new bundle adjustment procedure using regression splines to impose weak prior assumptions about human motion, temporal smoothness and joint angle limits, on the 3D reconstruction. The results show that even under close to ideal circumstances the monocular 3D reconstructions contain visual artifacts not present in the multiple view case, indicating accurate and efficient marker-less human motion capture requires multiple cameras., QC 20110930
Published: 2011
Full Text: View/download PDF

21. Motion Capture from Dynamic Orthographic Cameras

Author: Burenius, Magnus, Sullivan, Josephine, Carlsson, Stefan, Burenius, Magnus, Sullivan, Josephine, and Carlsson, Stefan
Abstract: We present an extension to the scaled orthographic camera model. It deals with dynamic cameras looking at faraway objects. The camera is allowed to change focal lengthand translate and rotate in 3D. The model we derive saysthat this motion can be treated as scaling, translation androtation in a 2D image plane. It is valid if the camera and itstarget move around in two separate regions that are smallcompared to the distance between them.We show two applications of this model to motion capture applications at large distances, i.e. outside a studio,using the afﬁne factorization algorithm. The model is usedto motivate theoretically why the factorization can be carried out in a single batch step, when having both dynamiccameras and a dynamic object. Furthermore, the model isused to motivate how the position of the object can be reconstructed by measuring the virtual 2D motion of the cameras. For testing we use videos from a real football gameand reconstruct the 3D motion of a footballer as he scoresa goal., QC 20111213
Published: 2011
Full Text: View/download PDF

22. Generic Object Class Detection using Feature Maps

Author: Danielsson, Oscar, Carlsson, Stefan, Danielsson, Oscar, and Carlsson, Stefan
Abstract: In this paper we describe an object class model and a detection scheme based on feature maps, i.e. binary images indicating occurrences of various local features. Any type of local feature and any number of features can be used to generate feature maps. The choice of which features to use can thus be adapted to the task at hand, without changing the general framework. An object class is represented by a boosted decision tree classifier (which may be cascaded) based on normalized distances to feature occurrences. The resulting object class model is essentially a linear combination of a set of flexible configurations of the features used. Within this framework we present an efficient detection scheme that uses a hierarchical search strategy. We demonstrate experimentally that this detection scheme yields a significant speedup compared to sliding window search. We evaluate the detection performance on a standard dataset [7], showing state of the art results. Features used in this paper include edges, corners, blobs and interest points., QC 20110830
Published: 2011
Full Text: View/download PDF

23. Gated Classifiers : Boosting under high intra-class variation

Author: Danielsson, Oscar, Rasolzadeh, Babak, Carlsson, Stefan, Danielsson, Oscar, Rasolzadeh, Babak, and Carlsson, Stefan
Abstract: In this paper we address the problem of using boosting (e.g. AdaBoost [7]) to classify a target class with significant intra-class variation against a large background class. This situation occurs for example when we want to recognize a visual object class against all other image patches. The boosting algorithm produces a strong classifier, which is a linear combination of weak classifiers. We observe that we often have sets of weak classifiers that individually fire on many examples of the target class but never fire together on those examples (i.e. their outputs are anti-correlated on the target class). Motivated by this observation we suggest a family of derived weak classifiers, termed gated classifiers, that suppress such combinations of weak classifiers. Gated classifiers can be used on top of any original weak learner. We run experiments on two popular datasets, showing that our method reduces the required number of weak classifiers by almost an order of magnitude, which in turn yields faster detectors. We experiment on synthetic data showing that gated classifiers enables more complex distributions to be represented. We hope that gated classifiers will extend the usefulness of boosted classifier cascades [29]., QC 20110830
Published: 2011
Full Text: View/download PDF

24. Projectable Classifiers for Multi-View Object Class Recognition

Author: Danielsson, Oscar, Carlsson, Stefan, Danielsson, Oscar, and Carlsson, Stefan
Abstract: We propose a multi-view object class modeling framework based on a simplified camera model and surfels (defined by a location and normal direction in a normalized 3D coordinate system) that mediate coarse correspondences between different views. Weak classifiers are learnt relative to the reference frames provided by the surfels. We describe a weak classifier that uses contour information when its corresponding surfel projects to a contour element in the image and color information when the face of the surfel is visible in the image. We emphasize that these weak classifiers can possibly take many different forms and use many different image features. Weak classifiers are combined using AdaBoost. We evaluate the method on a public dataset [8], showing promising results on categorization, recognition/detection, pose estimation and image synthesis., QC 20111205
Published: 2011
Full Text: View/download PDF

25. Novelty Detection from an Ego-Centric perspective

Author: Aghazadeh, Omid, Sullivan, Josephine, Carlsson, Stefan, Aghazadeh, Omid, Sullivan, Josephine, and Carlsson, Stefan
Abstract: This paper demonstrates a system for the automatic extraction of novelty in images captured from a small video camera attached to a subject's chest, replicating his visual perspective, while performing activities which are repeated daily. Novelty is detected when a (sub)sequence cannot be registered to previously stored sequences captured while performing the same daily activity. Sequence registration is performed by measuring appearance and geometric similarity of individual frames and exploiting the invariant temporal order of the activity. Experimental results demonstrate that this is a robust way to detect novelties induced by variations in the wearer's ego-motion such as stopping and talking to a person. This is an essentially new and generic way of automatically extracting information of interest to the camera wearer and can be used as input to a system for life logging or memory support., QC 20111012, VINST
Published: 2011
Full Text: View/download PDF

26. Generic Object Class Detection using Boosted Configurations of Oriented Edges

Author: Danielsson, Oscar, Carlsson, Stefan, Danielsson, Oscar, and Carlsson, Stefan
Abstract: In this paper we introduce a new representation for shape-based object class detection. This representation is based on very sparse and slightly flexible configurations of oriented edges. An ensemble of such configurations is learnt in a boosting framework. Each edge configuration can capture some local or global shape property of the target class and the representation is thus not limited to representing and detecting visual classes that have distinctive local structures. The representation is also able to handle significant intra-class variation. The representation allows for very efficient detection and can be learnt automatically from weakly labelled training images of the target class. The main drawback of the method is that, since its inductive bias is rather weak, it needs a comparatively large training set. We evaluate on a standard database [1] and when using a slightly extended training set, our method outperforms state of the art [2] on four out of five classes.
Published: 2010
Full Text: View/download PDF

27. Automatic Learning and Extraction of Multi-Local Features

Author: Danielsson, Oscar, Carlsson, Stefan, Sullivan, Josephine, Danielsson, Oscar, Carlsson, Stefan, and Sullivan, Josephine
Abstract: In this paper we introduce a new kind of feature - the multi-local feature, so named as each one is a collection of local features, such as oriented edgels, in a very specific spatial arrangement. A multi-local feature has the ability to capture underlying constant shape properties of exemplars from an object class. Thus it is particularly suited to representing and detecting visual classes that lack distinctive local structures and are mainly defined by their global shape. We present algorithms to automatically learn an ensemble of these features to represent an object class from weakly labelled training images of that class, as well as procedures to detect these features efficiently in novel images. The power of multi-local features is demonstrated by using the ensemble in a simple voting scheme to perform object category detection on a standard database. Despite its simplicity, this scheme yields detection rates matching state-of-the-art object detection systems., QC 20120917
Published: 2009
Full Text: View/download PDF

28. Object Detection using Multi-Local Feature Manifolds

Author: Danielsson, Oscar, Carlsson, Stefan, Sullivan, Josephine, Danielsson, Oscar, Carlsson, Stefan, and Sullivan, Josephine
Abstract: Many object categories are better characterized by the shape of their contour than by local appearance properties like texture or color. Multi-local features are designed in order to capture the global discriminative structure of an object while at the same time avoiding the drawbacks with traditional global descriptors such as sensitivity to irrelevant image properties. The specific structure of multi-local features allows us to generate new feature exemplars by linear combinations which effectively increases the set of stored training exemplars. We demonstrate that a multi-local feature is a good "weak detector" of shape-based object categories and that it can accurately estimate the bounding box of objects in an image. Using just a single multi-local feature descriptor we obtain detection results comparable to those of more complex and elaborate systems. It is our opinion that multi-local features have a great potential as generic object descriptors with very interesting possibilities of feature sharing within and between classes.
Published: 2008
Full Text: View/download PDF

29. Exploiting Part-Based Models and Edge Boundaries for Object Detection

Author: Sullivan, Josephine, Danielsson, Oscar, Carlsson, Stefan, Sullivan, Josephine, Danielsson, Oscar, and Carlsson, Stefan
Abstract: This paper explores how to exploit shape information to perform object class recognition. We use a sparse partbased model to describe object categories defined by shape. The sparseness allows the relative spatial relationship between parts to be described simply. It is possible, with this model, to highlight potential locations of the object and its parts in novel images. Subsequently these areas are examined by a more flexible shape model that measures if the image data provides evidence of the existence of boundary/connecting curves between connected hypothesized parts. From these measurements it is possible to construct a very simple cost function which indicates the presence or absence of the object class. The part-based model is designed to decouple variations due to affine warps and other forms of shape deformations. The latter are modeled probabilistically using conditional probability distributions which describe the linear dependencies between the location of a part and a subset of the other parts. These conditional distributions can then be exploited to search efficiently for the instances of the part model in novel images. Results are reported on experiments performed on the ETHZ shape classes database that features heavily cluttered images and large variations in scale., QC 20120120
Published: 2008
Full Text: View/download PDF

30. Tracking and labelling of interacting multiple targets

Author: Sullivan, Josephine, Carlsson, Stefan, Sullivan, Josephine, and Carlsson, Stefan
Abstract: Successful multi-target tracking requires solving two problems - localize the targets and label their identity. An isolated target's identity can be unambiguously preserved from one frame to the next. However, for long sequences of many moving targets, like a football game, grouping scenarios will occur in which identity labellings cannot be maintained reliably by using continuity of motion or appearance. This paper describes bow to match targets' identities despite these interactions. Trajectories of when a target is isolated are found. These trajectories end when targets interact and their labellings cannot be maintained. The interactions (merges and splits) of these trajectories form a graph structure. Appropriate feature vectors summarizing particular qualities of each trajectory are extracted. A clustering procedure based on these feature vectors allows the identities of temporally separated trajectories to be matched. Results are shown from a football match captured by a wide screen system giving a full stationary view of the pitch., QC 20111006
Published: 2006
Full Text: View/download PDF

31. Maximizing validity in 2D motion analysis

Author: Eriksson, Martin, Carlsson, Stefan, Eriksson, Martin, and Carlsson, Stefan
Abstract: Classifying and analyzing human motion from a video is relatively common in many areas. Since the motion is carried out in 3D space, the 2D projection provided by a video is somewhat limiting. The question we are investigating in this article is how much information is actually lost when going from 3D to 2D and how this information loss depends on factors, such as viewpoint and tracking errors that inevitably will occur if the 2D sequences are analysed automatically., QC 20111025
Published: 2004
Full Text: View/download PDF

32. Appearance based qualitative image description for object class recognition

Author: Thureson, Johan, Carlsson, Stefan, Thureson, Johan, and Carlsson, Stefan
Abstract: The problem of recognizing classes of objects as opposed to special instances requires methods of comparing images that capture the variation within the class while they discriminate against objects outside the class. We present a simple method for image description based on histograms of qualitative shape indexes computed from the combination of triplets of sampled locations and gradient directions in the image. We demonstrate that this method indeed is able to capture variation within classes of objects and we apply it to the problem of recognizing four different, categories from a large database. Using our descriptor on the whole image, containing varying degrees of background clutter, we obtain results for two of the objects that are superior to the best results published so far for this database. By cropping images manually we demonstrate that our method has a potential to handle also the other objects when supplied with an algorithm for searching the image. We argue that our method, based on qualitative image properties, capture the large range of variation that is typically encountered within an object class. This means that our method can be used on substantially larger patches of images than existing methods based on simpler criteria for evaluating image similarity., QC 20111024
Published: 2004
Full Text: View/download PDF

33. Monocular reconstruction of human motion by qualitative selection

Author: Eriksson, Martin, Carlsson, Stefan, Eriksson, Martin, and Carlsson, Stefan
Abstract: One of the main difficulties when reconstructing human motion from monocular video is the depth ambiguity. Achieving a reconstruction, given the projection of the joints, can be regarded as a search-problem, where the objective is to find the most likely configuration. One inherent problem in such a formulation is the definition of "most likely". In this work we will pick the configuration that best complies with a set of training-data in a qualitative sense. The reason for doing this is to allow for large individual variation within the class of motions, and avoid an extreme bias towards the training-data. In order to capture the qualitative constraints, we have used a set of 3D motion capture data of walking people. The method is tested on orthographic projections of motion capture data, in order to compare the achieved reconstruction with the original motion., QC 20111026
Published: 2004
Full Text: View/download PDF

34. Monocular 3D reconstruction of human motion in long action sequences

Author: Loy, Gareth, Eriksson, Martin, Sullivan, Josephine, Carlsson, Stefan, Loy, Gareth, Eriksson, Martin, Sullivan, Josephine, and Carlsson, Stefan
Abstract: A novel algorithm is presented for the 3D reconstruction of human action in long (> 30 second) monocular image sequences. A sequence is represented by a small set of automatically found representative keyframes. The skeletal joint positions are manually located in each keyframe and mapped to all other frames in the sequence. For each keyframe a 3D key pose is created, and interpolation between these 3D body poses, together with the incorporation of limb length and symmetry constraints, provides a smooth initial approximation of the 3D motion. This is then fitted to the image data to generate a realistic 3D reconstruction. The degree of manual input required is controlled by the diversity of the sequence's content. Sports' footage is ideally suited to this approach as it frequently contains a limited number of repeated actions. Our method is demonstrated on a long (36 second) sequence of a woman playing tennis filmed with a non-stationary camera. This sequence required manual initialisation on < 1.5% of the frames, and demonstrates that the system can deal with very rapid motion, severe self-occlusions, motion blur and clutter occurring over several concurrent frames. The monocular 3D reconstruction is verified by synthesising a view from the perspective of a 'ground truth' reference camera, and the result is seen to provide a qualitatively accurate 3D reconstruction of the motion., QC 20111019
Published: 2004
Full Text: View/download PDF

35. Method and device for generating wide image sequences

Author: Carlsson, Stefan, Hayman, Eric, Sullivan, Josephine, Carlsson, Stefan, Hayman, Eric, and Sullivan, Josephine
Abstract: The invention relates to a video recording apparatus comprising: a microprocessor (130), a memory means (120) for storing program for generating a set of calibration parameters related to a device having at least two video cameras which are arranged in a predetermined relationship to each other, said parameters being unique for the at least two cameras and their current location as related to the object being recorded; said memory means (120) also storing program for recording of wide image video sequences; read and write memory means (140) for storing data relating to recorded video sequences from at least two video cameras; input means (300) for input of manual input of parameters, input of recorded video sequences, output means (300) for output of a wide image video sequence. The invention also relates to a method for generating a wide image video sequence, said method comprising the steps of generating a set of calibration parameters related to a device having at least two video cameras which are arranged in a predetermined relationship to each other, said parameters being unique for the at least two cameras and their current location as related to the object being recorded; recording synchronously video sequences using each of said at least two video cameras, and generating a wide image video sequence from each of said synchronously recorded video sequences., QC 20120210. QC 20130806
Published: 2004

36. Large Scale, Large Margin Classification using Indefinite Similarity Measurens

Author: Aghazadeh, Omid, Carlsson, Stefan, Aghazadeh, Omid, and Carlsson, Stefan
Abstract: QS 2014

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

Publisher

36 results on '"Carlsson, Stefan"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources