Back to Search
Start Over
Video captioning with stacked attention and semantic hard pull
- Source :
- PeerJ Computer Science, PeerJ Computer Science, Vol 7, p e664 (2021)
- Publication Year :
- 2020
-
Abstract
- Video captioning,i.e., the task of generating captions from video sequences creates a bridge between the Natural Language Processing and Computer Vision domains of computer science. The task of generating a semantically accurate description of a video is quite complex. Considering the complexity, of the problem, the results obtained in recent research works are praiseworthy. However, there is plenty of scope for further investigation. This paper addresses this scope and proposes a novel solution. Most video captioning models comprise two sequential/recurrent layers—one as a video-to-context encoder and the other as a context-to-caption decoder. This paper proposes a novel architecture, namely Semantically Sensible Video Captioning (SSVC) which modifies the context generation mechanism by using two novel approaches—“stacked attention” and “spatial hard pull”. As there are no exclusive metrics for evaluating video captioning models, we emphasize both quantitative and qualitative analysis of our model. Hence, we have used the BLEU scoring metric for quantitative analysis and have proposed a human evaluation metric for qualitative analysis, namely the Semantic Sensibility (SS) scoring metric. SS Score overcomes the shortcomings of common automated scoring metrics. This paper reports that the use of the aforementioned novelties improves the performance of state-of-the-art architectures.
- Subjects :
- FOS: Computer and information sciences
Closed captioning
General Computer Science
Video captioning
Computer science
Computer Vision and Pattern Recognition (cs.CV)
Computer Vision
Data Mining and Machine Learning
Computer Science - Computer Vision and Pattern Recognition
Context (language use)
02 engineering and technology
computer.software_genre
Bridge (nautical)
Stacked attention
020204 information systems
0202 electrical engineering, electronic engineering, information engineering
Sequence to sequence
business.industry
Spatial Hard Pull
QA75.5-76.95
Natural Language and Speech
Task (computing)
Human–Computer Interaction
Quantitative analysis (finance)
Electronic computers. Computer science
Metric (mathematics)
020201 artificial intelligence & image processing
Artificial intelligence
LSTM
business
Encoder
computer
Scope (computer science)
Natural language processing
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- PeerJ Computer Science, PeerJ Computer Science, Vol 7, p e664 (2021)
- Accession number :
- edsair.doi.dedup.....27d1e9fcb309bbc80bed39b352e4c5bc