Fusion of Multimodal Embeddings for Ad-Hoc Video Search

Authors :: Benoit Huet
Chong-Wah Ngo
Danny Francis
Phuong Anh Nguyen
Source :: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), ICCV Workshops
Publication Year :: 2019
Abstract: The challenge of Ad-Hoc Video Search (AVS) originates from free-form (i.e., no pre-defined vocabulary) and free-style (i.e., natural language) query description. Bridging the semantic gap between AVS queries and videos becomes highly difficult as evidenced from the low retrieval accuracy of AVS benchmarking in TRECVID. In this paper, we study a new method to fuse multimodal embeddings which have been derived based on completely disjoint datasets. This method is tested on two datasets for two distinct tasks: on MSR-VTT for unique video retrieval and on V3C1 for multiple videos retrieval.

ISBN :: 978-1-72815-023-9
ISBNs :: 9781728150239
Database :: OpenAIRE
Journal :: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)
Accession number :: edsair.doi.dedup.....0e44f0fdd1056bb67f7ed1f879b73ef4
Full Text :: https://doi.org/10.1109/iccvw.2019.00233