9 results on '"Monay, Florent"'
Search Results
2. Modeling semantic aspects for cross-media image indexing
- Author
-
Monay, Florent and Gatica-Perez, Daniel
- Subjects
Technology application ,Semantics -- Technology application ,Automatic indexing -- Methods ,Image processing -- Methods - Abstract
To go beyond the query-by-example paradigm in image retrieval, there is a need for semantic indexing of large image collections for intuitive text-based image search. Different models have been proposed to learn the dependencies between the visual content of an image set and the associated text captions, then allowing for the automatic creation of semantic indexes for unannotated images. The task, however, remains unsolved. In this paper, we present three alternatives to learn a Probabilistic Latent Semantic Analysis (PLSA) model for annotated images and evaluate their respective performance for automatic image indexing. Under the PLSA assumptions, an image is modeled as a mixture of latent aspects that generates both image features and text captions, and we investigate three ways to learn the mixture of aspects. We also propose a more discriminative image representation than the traditional Blob histogram, concatenating quantized local color information and quantized local texture descriptors. The first learning procedure of a PLSA model for annotated images is a standard Expectation-Maximization (EM) algorithm, which implicitly assumes that the visual and the textual modalities can be treated equivalently. The other two models are based on an asymmetric PLSA learning, allowing to constrain the definition of the latent space on the visual or on the textual modality. We demonstrate that the textual modality is more appropriate to learn a semantically meaningful latent space, which translates into improved annotation performance. A comparison of our learning algorithms with respect to recent methods on a standard data set is presented, and a detailed evaluation of the performance shows the validity of our framework. Index Terms--Image annotation, textual indexing, image retrieval, quantized local descriptors, latent aspect modeling.
- Published
- 2007
3. A thousand words in a scene
- Author
-
Quelhas, Pedro, Monay, Florent, Odobez, Jean-Marc, Gatica-Perez, Daniel, and Tuytelaars, Tinne
- Subjects
Technology application ,Object recognition (Computers) -- Research ,Pattern recognition -- Research ,Visual perception -- Technology application ,Text processing -- Evaluation - Abstract
This paper presents a novel approach for visual scene modeling and classification, investigating the combined use of text modeling methods and local invariant features. Our work attempts to elucidate 1) whether a textlike bag-of-visterms (BOV) representation (histogram of quantized local visual features) is suitable for scene (rather than object) classification, 2) whether some analogies between discrete scene representations and text documents exist, and 3) whether unsupervised, latent space models can be used both as feature extractors for the classification task and to discover patterns of visual co-occurrence. Using several data sets, we validate our approach, presenting and discussing experiments on each of these issues. We first show, with extensive experiments on binary and multiclass scene classification tasks using a 9,500-image data set, that the BOV representation consistently outperforms classical scene classification approaches. In other data sets, we show that our approach competes with or outperforms other recent more complex methods. We also show that Probabilistic Latent Semantic Analysis (PLSA) generates a compact scene representation, is discriminative for accurate classification, and is more robust than the BOV representation when less labeled training data is available. Finally, through aspect-based image ranking experiments, we show the ability of PLSA to automatically extract visually meaningful scene patterns, making such representation useful for browsing image collections. Index Terms--Image representation, scene classification, object recognition, quantized local descriptors, latent aspect modeling.
- Published
- 2007
4. Contextual Classification of Image Patches with Latent Aspect Models
- Author
-
Quelhas Pedro, Monay Florent, Odobez Jean-Marc, and Gatica-Perez Daniel
- Subjects
Electronics ,TK7800-8360 - Abstract
We present a novel approach for contextual classification of image patches in complex visual scenes, based on the use of histograms of quantized features and probabilistic aspect models. Our approach uses context in two ways: (1) by using the fact that specific learned aspects correlate with the semantic classes, which resolves some cases of visual polysemy often present in patch-based representations, and (2) by formalizing the notion that scene context is image-specific—what an individual patch represents depends on what the rest of the patches in the same image are. We demonstrate the validity of our approach on a man-made versus natural patch classification problem. Experiments on an image collection of complex scenes show that the proposed approach improves region discrimination, producing satisfactory results and outperforming two noncontextual methods. Furthermore, we also show that co-occurrence and traditional (Markov random field) spatial contextual information can be conveniently integrated for further improved patch classification.
- Published
- 2009
5. Exploiting Scene Cues for Dropped Object Detection
- Author
-
Lopez-Mendez, Adolfo, Monay, Florent, and Odobez, Jean-Marc
- Abstract
This paper presents a method for the automated detection of dropped objects in surveillance scenarios, which is a very important task for abandoned object detection. Our method works in single views and exploits prior information of the scene, such as geometry or the fact that a number of false alarms are caused by known objects, such as humans. The proposed approach builds dropped object candidates by analyzing blobs obtained with a multi-layer background subtraction approach. The created dropped object candidates are then characterized both by appearance and by temporal aspects such as the estimated drop time. Next, we incorporate prior knowledge about the possible sizes and positions of dropped objects through an efficient filtering approach. Finally, the output of a human detector is exploited over in order to filter out static objects that are likely to be humans that remain still. Experimental results on the publicly available PETS2006 datasets and on several long sequences recorded in metro stations show the effectiveness of the proposed approach. Furthermore, our approach can operate in real-time.
6. Learning the structure of image collections with latent aspect models
- Author
-
Monay, Florent, Bourlard, Hervé, and Gatica-Perez, Daniel
- Subjects
scene classification ,latent aspect model ,classification de scènes ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,graphical model ,object classification ,recherche d’images ,modèle d’aspects latents ,classification d’objets ,image segmentation ,modèles graphiques ,image retrieval ,segmentation d’images - Abstract
The approach to indexing an image collection depends on the type of data to organize. Satellite images are likely to be searched with latitude and longitude coordinates, medical images are often searched with an image example that serves as a visual query, and personal image collections are generally browsed by event. A more general retrieval scenario is based on the use of textual keywords to search for images containing a specific object, or representing a given scene type. This requires the manual annotation of each image in the collection to allow for the retrieval of relevant visual information based on a text query. This time-consuming and subjective process is the current price to pay for a reliable and convenient text-based image search. This dissertation investigates the use of probabilistic models to assist the automatic organization of image collections, attempting to link the visual content of digital images with a potential textual description. Relying on robust, patch-based image representations that have proven to capture a variety of visual content, our work proposes to model images as mixtures of \emph{latent aspects}. These latent aspects are defined by multinomial distributions that capture patch co-occurrence information observed in the collection. An image is not represented by the direct count of its constituting elements, but as a mixture of latent aspects that can be estimated with principled, generative unsupervised learning methods. An aspect-based image representation therefore incorporates contextual information from the whole collection that can be exploited. This emerging concept is explored for several fundamental tasks related to image retrieval - namely classification, clustering, segmentation, and annotation - in what represents one of the first coherent and comprehensive study of the subject. We first investigate the possibility of classifying images based on their estimated aspect mixture weights, interpreting latent aspect modeling as an unsupervised feature extraction process. Several image categorization tasks are considered, where images are classified based on the present objects or according to their global scene type. We demonstrate that the concept of latent aspects allows to take advantage of non-labeled data to infer a robust image representation that achieves a higher classification performance than the original patch-based representation. Secondly, further exploring the concept, we show that aspects can correspond to an interesting soft clustering of an image collection that can serve as a browsing structure. Images can be ranked given an aspect, illustrating the corresponding co-occurrence context visually. In the third place, we derive a principled method that relies on latent aspects to classify image patches into different categories. This produces an image segmentation based on the resulting spatial class-densities. We finally propose to model images and their caption with a single aspect model, merging the co-occurrence contexts of the visual and the textual modalities in different ways. Once a model has been learned, the distribution of words given an unseen image is inferred based on its visual representation, and serves as textual indexing. Overall, we demonstrate with extensive experiments that the co-occurrence context captured by latent aspects is suitable for the above mentioned tasks, making it a promising approach for multimedia indexing.
7. EYEDIAP Database: Data Description and Gaze Tracking Evaluation Benchmarks
- Author
-
Funes Mora, Kenneth Alberto, Monay, Florent, and Odobez, Jean-Marc
- Abstract
The lack of a common benchmark for the evaluation of the gaze estimation task from RGB and RGB-D data is a serious limitation for distinguishing the advantages and disadvantages of the many proposed algorithms found in the literature. The EYEDIAP database intends to overcome this limitation by providing a common framework for the training and evaluation of gaze estimation approaches. In particular, this database has been designed to enable the evaluation of the robustness of algorithms with respect to the main challenges associated to this task: i) Head pose variations; ii) Person variation; iii) Changes in ambient and sensing conditions and iv) Types of target: screen or 3D object. This technical report contains an extended description of the database, we include the processing methodology for the elements provided along with the raw data, the database organization and additional benchmarks we consider relevant to evaluate diverse properties of a given gaze estimator.
8. Integrating co-occurrence and spatial contexts on patch-based scene segmentation
- Author
-
Monay, Florent, Quelhas, Pedro, Odobez, Jean-Marc, and Gatica-Perez, Daniel
- Subjects
vision - Abstract
We present a novel approach for contextual segmentation of complex visual scenes, based on the use of bags of local invariant features (visterms) and probabilistic aspect models. Our approach uses context in two ways: (1) by using the fact that specific learned aspects correlate with the semantic classes, which resolves some cases of visual polysemy, and (2) by formalizing the notion that scene context is image-specific -what an individual visterm represents depends on what the rest of the visterms in the same bag represent too-. We demonstrate the validity of our approach on a man-made vs. natural visterm classification problem. Experiments on an image collection of complex scenes show that the approach improves region discrimination, producing satisfactory results, and outperforming a non-contextual method. Furthermore, through the later use of a Markov Random Field model, we also show that co-occurrence and spatial contextual information can be conveniently integrated for improved visterm classification.
9. On Automatic Annotation of Images with Latent Space Models
- Author
-
Monay, Florent and Gatica-Perez, Daniel
- Subjects
vision ,ComputingMethodologies_PATTERNRECOGNITION ,InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION - Abstract
Image auto-annotation, i.e., the association of words to whole images, has attracted considerable attention. In particular, unsupervised, probabilistic latent variable models of text and image features have shown encouraging results, but their performance with respect to other approaches remains unknown. In this paper, we apply and compare two simple latent space models commonly used in text analysis, namely Latent Semantic Analysis (LSA) and Probabilistic LSA (PLSA). Annotation strategies for each model are discussed. Remarkably, we found that, on a 8000-image dataset, a classic LSA model defined on keywords and a very basic image representation performed as well as much more complex, state-of-the-art methods. Furthermore, non-probabilistic methods (LSA and direct image matching) outperformed PLSA on the same dataset.
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.