1. Learning Cross-Modal Aligned Representation With Graph Embedding
- Author
-
Jiayan Cao, Youcai Zhang, and Xiaodong Gu
- Subjects
semi-supervised learning ,General Computer Science ,neural network ,Computer science ,Graph embedding ,Feature extraction ,02 engineering and technology ,Machine learning ,computer.software_genre ,Data modeling ,cross-modal retrieval ,Discriminative model ,Graph embedding learning ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,Contextual image classification ,Artificial neural network ,business.industry ,General Engineering ,020207 software engineering ,Embedding ,020201 artificial intelligence & image processing ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,Artificial intelligence ,Laplacian matrix ,business ,lcsh:TK1-9971 ,computer - Abstract
The main task of cross-modal analysis is to learn discriminative representation shared across different modalities. In order to pursue aligned representation, conventional approaches tend to construct and optimize a linear projection or train a complex architecture of deep layers, yet it is difficult to compromise between accuracy and efficiency on modeling multimodal data. This paper proposes a novel graph-embedding learning framework implemented by neural networks. The learned embedding directly approximates the cross-modal aligned representation to perform cross-modal retrieval and image classification combining text information. Proposed framework extracts learned representation from a graph model and, simultaneously, trains a classifier under semi-supervised settings. For optimization, unlike previous methods based on the graph Laplacian regularization, a sampling strategy is adopted to generate training pairs to fully explore the inter-modal and intra-modal similarity relationship. Experimental results on various datasets show that the proposed framework outperforms other state-of-the-art methods on cross-modal retrieval. The framework also demonstrates convincing improvements on the new issue of image classification combining text information on Wiki dataset.
- Published
- 2018
- Full Text
- View/download PDF