Back to Search Start Over

Image Captioning Encoder–Decoder Models Using CNN-RNN Architectures: A Comparative Study.

Authors :
Suresh, K. Revati
Jarapala, Arun
Sudeep, P. V.
Source :
Circuits, Systems & Signal Processing. Oct2022, Vol. 41 Issue 10, p5719-5742. 24p.
Publication Year :
2022

Abstract

An image caption generator produces syntactically and semantically correct sentences to narrate the scene of a natural image. A neural image caption (NIC) generator is a popular deep learning model for automatically generating image captions in plain English. The NIC generator combines a convolutional neural network (CNN) encoder and a long short-term memory (LSTM) decoder. This paper investigates the performance of different CNN encoders and recurrent neural network decoders for finding the best NIC generator model for image captioning. Besides, we test the image caption generators with four image inject models and with decoding strategies such as greedy search and beam search. We conducted experiments on the Flickr8k dataset and analyzed the results qualitatively and quantitatively. Our results validate the automated image caption generator with ResNet-101 encoder, and the LSTM/gated recurrent units decoder outperforms the popular neural image caption NIC generator in the presence of par-inject concatenate conditioning and beam search. For quantitative assessment, we used R O U G E L , C I D E r D , and B L E U n scores to compare the different models. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0278081X
Volume :
41
Issue :
10
Database :
Academic Search Index
Journal :
Circuits, Systems & Signal Processing
Publication Type :
Academic Journal
Accession number :
158432600
Full Text :
https://doi.org/10.1007/s00034-022-02050-2