Start Over

Multi-label double-layer learning for cross-modal retrieval.

Authors :: He, Jianfeng
Ma, Bingpeng
Wang, Shuhui
Liu, Yugui
Huang, Qingming
Source :: Neurocomputing. Jan2018, Vol. 275, p1893-1902. 10p.
Publication Year :: 2018
Abstract: This paper proposes a novel method named Multi-label Double-layer Learning (MDLL) for multi-label cross-modal retrieval task. MDLL includes two stages (layers): L2C (Label to Common) and C2L (Common to Label). In the L2C stage, considering that labels can provide semantic information, we take label information as an auxiliary modality and apply a covariance matrix to represent label similarity in multi-label situation. Thus we can maximize the correlation of different modalities and reduce their semantic gap in the L2C stage. In addition, we find that samples with the same semantic labels may have different contents from users’ view. According to this problem, in the C2L stage, labels are projected to a latent space learned from features of image and text. By this way, the label latent space are more related to the sample’s contents. Then, it is noticed that the samples have same labels but various contents can be decreased. In MDLL, iterative learning of the L2C and C2L stages will improve the discriminative ability greatly and decline the discrepancy between the labels and the contents. To show the effectiveness of MDLL, some experiments are conducted on three multi-label cross-modal retrieval tasks (Pascal Voc 2007, Nus-wide, and LabelMe), on which competitive results are obtained. [ABSTRACT FROM AUTHOR]