Back to Search Start Over

Studying the impact of the Full-Network embedding on multimodal pipelines.

Authors :
Espinosa-Anke, Luis
Declerck, Thierry
Gromann, Dagmar
Vilalta, Armand
Garcia-Gasulla, Dario
Parés, Ferran
Ayguadé, Eduard
Labarta, Jesus
Moya-Sánchez, E. Ulises
Cortés, Ulises
Espinosa Anke, Luis
Source :
Semantic Web (1570-0844); 2019, Vol. 10 Issue 5, p909-923, 15p
Publication Year :
2019

Abstract

The current state of the art for image annotation and image retrieval tasks is obtained through deep neural network multimodal pipelines, which combine an image representation and a text representation into a shared embedding space. In this paper we evaluate the impact of using the Full-Network embedding (FNE) in this setting, replacing the original image representation in four competitive multimodal embedding generation schemes. Unlike the one-layer image embeddings typically used by most approaches, the Full-Network embedding provides a multi-scale discrete representation of images, which results in richer characterisations. Extensive testing is performed on three different datasets comparing the performance of the studied variants and the impact of the FNE on a levelled playground, i.e., under equality of data used, source CNN models and hyper-parameter tuning. The results obtained indicate that the Full-Network embedding is consistently superior to the one-layer embedding. Furthermore, its impact on performance is superior to the improvement stemming from the other variants studied. These results motivate the integration of the Full-Network embedding on any multimodal embedding generation scheme. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
15700844
Volume :
10
Issue :
5
Database :
Complementary Index
Journal :
Semantic Web (1570-0844)
Publication Type :
Academic Journal
Accession number :
138696096
Full Text :
https://doi.org/10.3233/SW-180341