Start Over

Semantic Enhanced Video Captioning with Multi-feature Fusion.

Authors :: TIAN-ZI NIU
SHAN-SHAN DONG
ZHEN-DUO CHEN
XIN LUO
SHANQING GUO
ZI HUANG
XIN-SHUN XU
Source :: ACM Transactions on Multimedia Computing, Communications & Applications; Nov2023, Vol. 19 Issue 6, p1-21, 21p
Publication Year :: 2023
Abstract: Video captioning aims to automatically describe a video clip with informative sentences. At present, deep learning-based models have become the mainstream for this task and achieved competitive results on public datasets. Usually, these methods leverage different types of features to generate sentences, e.g., semantic information, 2D or 3D features.However, some methods only treat semantic information as a complement of visual representations and cannot fully exploit it; some of them ignore the relationship between different types of features. In addition, most of them select multiple frames of a video with an equally spaced sampling scheme, resulting in much redundant information. To address these issues, we present a novel video-captioning framework, Semantic Enhanced video captioning with Multi-feature Fusion, SEMF for short. It optimizes the use of different types of features from three aspects. First, a semantic encoder is designed to enhance meaningful semantic features through a semantic dictionary to boost performance. Second, a discrete selection module pays attention to important features and obtains different contexts at different steps to reduce feature redundancy. Finally, a multi-feature fusionmodule uses a novel relation-aware attentionmechanism to separate the common and complementary components of different features to provide more effective visual features for the next step. Moreover, the entire framework can be trained in an end-to-endmanner. Extensive experiments are conducted on Microsoft Research Video Description Corpus (MSVD) and MSR-Video to Text (MSR-VTT) datasets. The results demonstrate that SEMF is able to achieve state-of-the-art results. [ABSTRACT FROM AUTHOR]

Details

Language :: English
ISSN :: 15516857
Volume :: 19
Issue :: 6
Database :: Complementary Index
Journal :: ACM Transactions on Multimedia Computing, Communications & Applications
Publication Type :: Academic Journal
Accession number :: 165034643
Full Text :: https://doi.org/10.1145/3588572