Start Over

CoEvo-Net: Coevolution Network for Video Highlight Detection.

Authors :: Chen, Jiawei
Wang, Jian
Wang, Xinchao
Wang, Xingen
Feng, Zunlei
Liu, Ruitao
Song, Mingli
Source :: IEEE Transactions on Circuits & Systems for Video Technology; Jun2022, Vol. 32 Issue 6, p3788-3797, 10p
Publication Year :: 2022
Abstract: Video highlight detection (VHD) has emerged as a pressing task due to the unprecedentedly increasing amount of video data, such as those from e-commerce live-broadcasting platforms. Many approaches focus on exploiting text data, in the form of video description or time-sync comments, to facilitate the VHD task. Despite the promising results, they have largely overlooked the noises inherent in the text data and have mostly relied on isolating the feature of text and video. In this paper, we introduce a novel model to handle VHD, termed Coevolution Network (CoEvo-Net), that allows us to account for joint learning of the language and video features explicitly via a coevolution paradigm, in which features from the two data modalities progressively refine each other. This is achieved by a dedicated CoEvo-Cell that takes language and video together as inputs, extracts cross-modality, and filters the undesired parts of the input, such as words in a sentence. Furthermore, we release a large-scale dataset of e-commerce for VHD, in which each video is coupled with a sentence for description, to benchmark the sentence-based VHD approaches. Extensive experiments on the released dataset demonstrate that CoEvo-Net achieves state-of-the-art performance. Our dataset and code will be made publicly available. [ABSTRACT FROM AUTHOR]