15 results on '"image auto-annotation"'
Search Results
2. Training Visual-Semantic Embedding Network for Boosting Automatic Image Annotation.
- Author
-
Zhang, Weifeng, Hu, Hua, and Hu, Haiyang
- Subjects
COMPUTER vision ,ANNOTATIONS ,ARTIFICIAL intelligence ,IMAGE processing ,ACTIVE computer vision - Abstract
Image auto-annotation which annotates images according to their semantic contents has become a research focus in computer vision, as it helps people to edit, retrieve and understand large image collections. In the last decades, researchers have proposed many approaches to solve this task and achieved remarkable performance on several standard image datasets. In this paper, we train neural networks using visual and semantic ranking loss to learn visual-semantic embedding. This embedding can be easily applied to nearest-neighbor based models to boost their performance on image auto-annotation. We test our method on four challenging image datasets, reporting comparison results with existing works. Experimental results show that our method can be applied to several state-of-the-art nearest-neighbor based models including TagProp and 2PKNN, and significantly improves their performance. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
3. Combining Image-Level and Segment-Level Models for Automatic Annotation
- Author
-
Kuettel, Daniel, Guillaumin, Matthieu, Ferrari, Vittorio, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Schoeffmann, Klaus, editor, Merialdo, Bernard, editor, Hauptmann, Alexander G., editor, Ngo, Chong-Wah, editor, Andreopoulos, Yiannis, editor, and Breiteneder, Christian, editor
- Published
- 2012
- Full Text
- View/download PDF
4. Approximation of Linear Discriminant Analysis for Word Dependent Visual Features Selection
- Author
-
Glotin, Hervé, Tollari, Sabrina, Giraudet, Pascale, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Blanc-Talon, Jacques, editor, Philips, Wilfried, editor, Popescu, Dan, editor, and Scheunders, Paul, editor
- Published
- 2005
- Full Text
- View/download PDF
5. Image auto-annotation via concept interdependency network.
- Author
-
Xu, HaiJiao, Pan, Peng, Xu, ChunYan, Lu, YanSheng, and Chen, Deng
- Subjects
MULTIMEDIA computer applications ,COMPUTER graphics ,QUALITY control of information storage & retrieval systems ,ANNOTATIONS ,IMAGE processing - Abstract
With the explosive growth of multimedia data such as unlabeled images on the Web, image auto-annotation has been receiving increasing research interest. By automatically assigning a set of concepts to unlabeled images, image retrieval can be performed over labeled concepts. Most existing studies focus on the relations between images and concepts, and ignore the interdependencies between labeled concepts. In this paper, we propose a novel image auto-annotation model which utilizes the concept interdependency network to achieve better image auto-annotation. When a concept and its interdependent concepts have a high co-occurrence frequency in the training set, we consider boosting the chance of predicting this concept if there is strong visual evidence for the interdependent concepts in an unlabeled image. Additionally, we combine the global concept interdependency and the local concept interdependency to enhance the auto-annotation performance. Extensive experiments on Corel and IAPR datasets show that the proposed approach almost outperforms all existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
6. Image auto-annotation via tag-dependent random search over range-constrained visual neighbours.
- Author
-
Lin, Zijia, Ding, Guiguang, and Hu, Mingqing
- Subjects
IMAGE analysis ,CONSTRAINT satisfaction ,DIGITAL photography ,INTERNET users ,ONLINE social networks - Abstract
The quantity setting of visual neighbours can be critical for the performance of many previously proposed visual-neighbour-based (VNB) image auto-annotation methods. And in those methods, each candidate tag of a to-be-annotated image would be better to have its own trustworthy part of visual neighbours for score prediction. Hence in this paper we propose to use a constrained range rather than an identical and fixed number of visual neighbours for VNB methods to allow more flexible choices of neighbours, and then put forward a novel tag-dependent random search process to estimate the tag-dependent trust degrees of visual neighbours for each candidate tag. We further propose an effective image auto-annotation method termed TagSearcher based on a widely-used conditional probability model for auto-annotation, considering image-dependent weights of visual neighbours, tag-dependent trust degrees of visual neighbours and votes for a candidate tag from visual neighbours. Extensive experiments conducted on both a benchmark dataset and real-world web images present that the proposed TagSearcher can yield inspiring annotation performance and also reduce the performance sensitivity to the quantity setting of visual neighbours. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
7. Variational learning of a Dirichlet process of generalized Dirichlet distributions for simultaneous clustering and feature selection.
- Author
-
Fan, Wentao and Bouguila, Nizar
- Subjects
- *
MACHINE learning , *DIRICHLET forms , *GENERALIZATION , *CLUSTER analysis (Statistics) , *FEATURE selection , *DISTRIBUTION (Probability theory) , *PARAMETER estimation , *SEMANTICS - Abstract
Abstract: This paper introduces a novel enhancement for unsupervised feature selection based on generalized Dirichlet (GD) mixture models. Our proposal is based on the extension of the finite mixture model previously developed in [1] to the infinite case, via the consideration of Dirichlet process mixtures, which can be viewed actually as a purely nonparametric model since the number of mixture components can increase as data are introduced. The infinite assumption is used to avoid problems related to model selection (i.e. determination of the number of clusters) and allows simultaneous separation of data in to similar clusters and selection of relevant features. Our resulting model is learned within a principled variational Bayesian framework that we have developed. The experimental results reported for both synthetic data and real-world challenging applications involving image categorization, automatic semantic annotation and retrieval show the ability of our approach to provide accurate models by distinguishing between relevant and irrelevant features without over- or under-fitting the data. [Copyright &y& Elsevier]
- Published
- 2013
- Full Text
- View/download PDF
8. Image auto-annotation with automatic selection of the annotation length.
- Author
-
Maier, Oskar, Kwasnicka, Halina, and Stanek, Michal
- Subjects
IMAGE retrieval ,SEARCH engines ,INTERNET searching ,AUTOMATIC control systems ,MACHINE learning ,PROGRAM transformation - Abstract
Developing a satisfactory and effective method for auto-annotating images that works under general conditions is a challenging task. The advantages of such a system would be manifold: it can be used to annotate existing, large databases of images, rendering them accessible to text search engines; or it can be used as core for image retrieval based on a query image's visual content. Manual annotation of images is a difficult, tedious and time consuming task. Furthermore, manual annotations tend to show great inter-person variance: considering an image, the opinions about what elements are significant and deserve an annotation vary strongly. The latter poses a problem for the evaluation of an automatic method, as an annotation's correctness is greatly subjective. In this paper we present an automatic method for annotating images, which addresses one of the existing methods' major limitation, namely a fixed annotation length. The proposed method, PATSI, automatically chooses the resulting annotation's length for each query image. It is held as simple as possible and a build-in parameter optimization procedure renders PATSI de-facto parameter free. Finally, PATSI is evaluated on standard datasets, outperforming various state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
9. A Study of Quality Issues for Image Auto-Annotation With the Corel Dataset.
- Author
-
Jiayu Tang and Lewis, Paul H.
- Subjects
- *
AUTOMATION , *IMAGE processing , *JPEG (Image coding standard) , *IMAGE compression standards , *MACHINE learning , *MULTIMEDIA systems - Abstract
The Corel Image set is widely used for image annotation performance evaluation although it has been claimed that Corel images are relatively easy to annotate. The aim of this paper is to demonstrate some of the disadvantages of datasets like the Corel set for effective auto-annotation evaluation. We first compare the performance of several annotation algorithms using the Corel set and find that simple near neighbor propagation techniques perform fairly well. A support vector machine (SVM)-based annotation method achieves even better results, almost as good as the best found in the literature. We then build a new image collection using the Yahoo Image Search engine and query-by-single-word searches to create a more challenging annotated set automatically. Then, using three very different image annotation methods, we demonstrate some of the problems of annotation using the Corel set compared with the Yahoo-based training set. In both cases the training sets are used to create a set of annotations for the Corel test set. [ABSTRACT FROM AUTHOR]
- Published
- 2007
- Full Text
- View/download PDF
10. Word sense disambiguation with pictures
- Author
-
Barnard, Kobus and Johnson, Matthew
- Subjects
- *
ART , *AESTHETICS , *SENSES , *ALGORITHMS - Abstract
Abstract: We introduce using images for word sense disambiguation, either alone, or in conjunction with traditional text based methods. The approach is based on a recently developed method for automatically annotating images by using a statistical model for the joint probability for image regions and words. The model itself is learned from a data base of images with associated text. To use the model for word sense disambiguation, we constrain the predicted words to be possible senses for the word under consideration. When word prediction is constrained to a narrow set of choices (such as possible senses), it can be quite reliable. We report on experiments using the resulting sense probabilities as is, as well as augmenting a state of the art text based word sense disambiguation algorithm. In order to evaluate our approach, we developed a new corpus, ImCor, which consists of a substantive portion of the Corel image data set associated with disambiguated text drawn from the SemCor corpus. Our experiments using this corpus suggest that visual information can be very useful in disambiguating word senses. It also illustrates that associated non-textual information such as image data can help ground language meaning. [Copyright &y& Elsevier]
- Published
- 2005
- Full Text
- View/download PDF
11. Crossing textual and visual content in different application scenarios
- Author
-
Ah-Pine, Julien, Bressan, Marco, Clinchant, Stephane, Csurka, Gabriela, Hoppenot, Yves, and Renders, Jean-Michel
- Published
- 2009
- Full Text
- View/download PDF
12. Word sense disambiguation with pictures
- Author
-
Kobus Barnard and Matthew Johnson
- Subjects
Linguistics and Language ,Computer science ,business.industry ,Speech recognition ,Word sense disambiguation ,Statistical model ,Image auto-annotation ,Meaning (non-linguistic) ,computer.software_genre ,Language and Linguistics ,SemEval ,Conjunction (grammar) ,Statistical models ,Word lists by frequency ,Artificial Intelligence ,Artificial intelligence ,Region labeling ,Set (psychology) ,business ,Connected-component labeling ,computer ,Word (computer architecture) ,Natural language processing - Abstract
We introduce using images for word sense disambiguation, either alone, or in conjunction with traditional text based methods. The approach is based on a recently developed method for automatically annotating images by using a statistical model for the joint probability for image regions and words. The model itself is learned from a data base of images with associated text. To use the model for word sense disambiguation, we constrain the predicted words to be possible senses for the word under consideration. When word prediction is constrained to a narrow set of choices (such as possible senses), it can be quite reliable. We report on experiments using the resulting sense probabilities as is, as well as augmenting a state of the art text based word sense disambiguation algorithm. In order to evaluate our approach, we developed a new corpus, ImCor, which consists of a substantive portion of the Corel image data set associated with disambiguated text drawn from the SemCor corpus. Our experiments using this corpus suggest that visual information can be very useful in disambiguating word senses. It also illustrates that associated non-textual information such as image data can help ground language meaning.
- Published
- 2005
- Full Text
- View/download PDF
13. Variational learning of a Dirichlet process of generalized Dirichlet distributions for simultaneous clustering and feature selection
- Subjects
Dirichlet process ,Generalized Dirichlet ,Feature selection ,Image auto-annotation ,Infinite mixture models ,Images categorization ,Clustering - Published
- 2013
14. İç mekan tanıma için en yakın komşuya dayalı metrik fonksiyonlar
- Author
-
Çakır, Fatih, Ulusoy, Özgür, Güdükbay, Uğur, and Diğer
- Subjects
Signal processing--Digital techniques ,Image classification ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,TA1634 .C35 2011 ,Image recognition ,image auto-annotation ,Object recognition ,Computer vision ,Computer Engineering and Computer Science and Control ,Image processing--Digital techniques ,scene classi cation ,indoor scene recognition ,Labelling ,bag-of-visual words ,nearest neighbor classi- er ,Pattern recognition systems ,Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol - Abstract
İç mekan tanıma, insan yapımı yapıların yüksek sınıf içi varyasyonlar ve sınıfarası benzerlikler göstermesi sebebiyle klasik mekan tanıma alanının zorlu birproblemidir. Resmin bütüncül temsillerini çıkarmak gibi en ileri mekan tanımateknikleri iç mekanlarda düşük performans göstermektedirler. Nesnelerin belirlenmesive ardından onların mekanlarla ilişkilendirilmesi gibi ara kademeler kullanandiğer yöntemlerin de oldukça karmaşık bir ortamda nesnelerin başarıyla lokalizeedilmesi ve tanınması handikapları vardır.Kodkitabı olarak da bilinen görsel kelimeler kümesi tekniğinden faydalanaraken yakın komşu yöntemine dayalı bir metrik fonksiyonu ile bu zorluklarınüstesinden gelebilen bir sınıflandırma yöntemi öneriyoruz. Kodkitabı oluşumuöznitelik uzayının mozaikleştirilmesi olarak ele alınırsa, verilen bir resim için,öznitelik vektörlerinin Voronoi hücrelerinin ortalarına olan öğrenilmiş ağırlıklıuzaklıklarının resmin kategorisi için güçlü bir gösterge olduğunu gözlemledik.Yöntemimiz tek bir tanımlayıcı ile bir iç mekan testinde en gelişmiş yaklaşımlarıgeçmekte ve genel bir mekan veri kümesinde rekabetçi sonuçlar üretmektedir.Bu çalışmada her ne kadar temel sorunumuz iç mekan kategorizasyonu olsa da,önerilen metrik fonksiyonunu otomatik etiketleme problemine de bir temel uygulamaoluşturmak için kullanıyoruz. Gittikçe artan sayısal medya ile, otomatik olarak resimlereanlamsal etiketler çıkarma problemine son on yılda araştırmacılar büyük ilgigöstermişlerdir. Bu tarz içerikleri manüel olarak etiketlemek gibi geleneksel yaklaşımlarçok bıktırıcı ve zaman harcayıcı olarak değerlendirilmektedir. Bu neden resimlerianlamsal olarak başarıyla açıklayan anahtar kelimelerle otomatik olarak etiketleme,çözülmeyi bekleyen önemli bir sorundur. Indoor scene recognition is a challenging problem in the classical scene recognitiondomain due to the severe intra-class variations and inter-class similarities ofman-made indoor structures. State-of-the-art scene recognition techniques suchas capturing holistic representations of an image demonstrate low performance onindoor scenes. Other methods that introduce intermediate steps such as identifyingobjects and associating them with scenes have the handicap of successfullylocalizing and recognizing the objects in a highly cluttered and sophisticatedenvironment.We propose a classication method that can handle such diculties of theproblem domain by employing a metric function based on the nearest-neighborclassication procedure using the bag-of-visual words scheme, the so-called codebooks.Considering the codebook construction as a Voronoi tessellation of thefeature space, we have observed that, given an image, a learned weighted distanceof the extracted feature vectors to the center of the Voronoi cells gives a strongindication of the image's category. Our method outperforms state-of-the-art approacheson an indoor scene recognition benchmark and achieves competitiveresults on a general scene dataset, using a single type of descriptor.In this study although our primary focus is indoor scene categorization, we alsoemploy the proposed metric function to create a baseline implementation for theauto-annotation problem. With the growing amount of digital media, the problemof auto-annotating images with semantic labels has received signicant interestfrom researches in the last decade. Traditional approaches where such content ismanually tagged has been found to be too tedious and a time-consuming process.Hence, succesfully labeling images with keywords describing the semantics is acrucial task yet to be accomplished. 57
- Published
- 2011
15. Données multimodales pour l'analyse d'image
- Author
-
Guillaumin, Matthieu, Learning and recognition in vision (LEAR), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), Institut National Polytechnique de Grenoble - INPG, Cordelia Schmid, Team, THOTH, Laboratoire d'informatique GRAphique, VIsion et Robotique de Grenoble (GRAVIR - IMAG), Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Inria Grenoble - Rhône-Alpes, and Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Weakly supervised learning ,Metric learning ,Image auto-annotation ,Object recognition ,Multiple instance metric learning ,image retrieval ,Multimodal semisupervised learning ,Constrained clustering ,Nearest neighbour models ,Keyword-based ,[INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC] ,[INFO.INFO-HC] Computer Science [cs]/Human-Computer Interaction [cs.HC] ,Face recognition ,Face verification - Abstract
This dissertation delves into the use of textual metadata for image understanding. We seek to exploit this additional textual information as weak supervision to improve the learning of recognition models. There is a recent and growing interest for methods that exploit such data because they can potentially alleviate the need for manual annotation, which is a costly and time-consuming process. We focus on two types of visual data with associated textual information. First, we exploit news images that come with descriptive captions to address several face related tasks, including face verification, which is the task of deciding whether two images depict the same individual, and face naming, the problem of associating faces in a data set to their correct names. Second, we consider data consisting of images with user tags. We explore models for automatically predicting tags for new images, i.e. image auto-annotation, which can also used for keyword-based image search. We also study a multimodal semi-supervised learning scenario for image categorisation. In this setting, the tags are assumed to be present in both labelled and unlabelled training data, while they are absent from the test data. Our work builds on the observation that most of these tasks can be solved if perfectly adequate similarity measures are used. We therefore introduce novel approaches that involve metric learning, nearest neighbour models and graph-based methods to learn, from the visual and textual data, task-specific similarities. For faces, our similarities focus on the identities of the individuals while, for images, they address more general semantic visual concepts. Experimentally, our approaches achieve stateof- the-art results on several standard and challenging data sets. On both types of data, we clearly show that learning using additional textual information improves the performance of visual recognition systems., La présente thèse s'intéresse à l'utilisation de méta-données textuelles pour l'analyse d'image. Nous cherchons à utiliser ces informations additionelles comme supervision faible pour l'apprentissage de modèles de reconnaissance visuelle. Nous avons observé un récent et grandissant intérêt pour les méthodes capables d'exploiter ce type de données car celles-ci peuvent potentiellement supprimer le besoin d'annotations manuelles, qui sont coûteuses en temps et en ressources. Nous concentrons nos efforts sur deux types de données visuelles associées à des informations textuelles. Tout d'abord, nous utilisons des images de dépêches qui sont accompagnées de légendes descriptives pour s'attaquer à plusieurs problèmes liés à la reconnaissance de visages. Parmi ces problèmes, la vérification de visages est la tâche consistant à décider si deux images représentent la même personne, et le nommage de visages cherche à associer les visages d'une base de données à leur noms corrects. Ensuite, nous explorons des modèles pour prédire automatiquement les labels pertinents pour des images, un problème connu sous le nom d'annotation automatique d'image. Ces modèles peuvent aussi être utilisés pour effectuer des recherches d'images à partir de mots-clés. Nous étudions enfin un scénario d'apprentissage multimodal semi-supervisé pour la catégorisation d'image. Dans ce cadre de travail, les labels sont supposés présents pour les données d'apprentissage, qu'elles soient manuellement annotées ou non, et absentes des données de test. Nos travaux se basent sur l'observation que la plupart de ces problèmes peuvent être résolus si des mesures de similarité parfaitement adaptées sont utilisées. Nous proposons donc de nouvelles approches qui combinent apprentissage de distance, modèles par plus proches voisins et méthodes par graphes pour apprendre, à partir de données visuelles et textuelles, des similarités visuelles spécifiques à chaque problème. Dans le cas des visages, nos similarités se concentrent sur l'identité des individus tandis que, pour les images, elles concernent des concepts sémantiques plus généraux. Expérimentalement, nos approches obtiennent des performances à l'état de l'art sur plusieurs bases de données complexes. Pour les deux types de données considérés, nous montrons clairement que l'apprentissage bénéficie de l'information textuelle supplémentaire résultant en l'amélioration de la performance des systèmes de reconnaissance visuelle.
- Published
- 2010
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.