1. An encoder-decoder model for video captioning using RESNET and GRU.
- Author
-
Preethi, A. and Dhanalakshmi, P.
- Subjects
- *
VIDEOS , *VIDEO processing , *VIDEO coding - Abstract
Video Captioning is a process that generates the sentences for the visual information in a video. It is an essential process for video retrieval and analysis. Unlike the still images, the frames in video are temporally connected. It is very important to consider the visual, temporal and grammatical information while generating captions for a video. This is done through encoder-decoder architecture model. In encoder module, the ResNet-152 is used as a feature extractor to obtain the features from video frames. Then, in the decoder module, LSTM and GRU were employed to make the sentence generation. The architecture is trained and tested over the benchmark dataset Microsoft Video Description Corpus (MSVD) and performance is evaluated using BLEU, METEOR and CIDEr. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF