Start Over

Enhancing image captioning performance based on efficientnet B0 model and transformer encoder-decoder.

Authors :: Joshi, Abhisht
Alkhayyat, Ahmed
Gunwant, Harsh
Tripathi, Abhay
Sharma, Moolchand
Source :: AIP Conference Proceedings. 2024, Vol. 3919 Issue 1, p1-14. 14p.
Publication Year :: 2024
Abstract: In recent years, improvements in natural language processing and computer vision have come together to provide automatic image caption generation. Image captioning is the process of creating a description for an image. Captioning an image needs the recognition of significant items, their properties, and their connections within the image. Additionally, it must create phrases that are syntactically and semantically accurate. Deep learning approaches can address the complexities and difficulties associated with image captions. This paper describes a joint model which is capable of automatically captioning images using EfficientNet-B0 and a transformer with multi-head attention. The model is an aggregation of an EfficientNet & Transformer single encoder and decoder. The encoder utilizes EfficientNet-B0, a convolutional neural network-based algorithm that generates a detailed input image, represented by embedding them into a fixed-length vector. The decoder employs a transformer, and a multi-head attention mechanism to selectively concentrate attention on certain regions of images to predict the sentence. The proposed model was trained using a large dataset Flickr8k to optimize BLEU N-Gram (N=1,2,3,4), METEOR score, and CIDEr, to assess the probability of the target description phrase given in the training images. Our studies show that the proposed model can produce captions for images automatically. [ABSTRACT FROM AUTHOR]

Subjects :: *TRANSFORMER models
*DEEP learning
*NATURAL language processing
*COMPUTER vision
*IMAGE recognition (Computer vision)
*PROGRAMMING languages

Details

Language :: English
ISSN :: 0094243X
Volume :: 3919
Issue :: 1
Database :: Academic Search Index
Journal :: AIP Conference Proceedings
Publication Type :: Conference
Accession number :: 176251261
Full Text :: https://doi.org/10.1063/5.0184395

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Enhancing image captioning performance based on efficientnet B0 model and transformer encoder-decoder.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Enhancing image captioning performance based on efficientnet B0 model and transformer encoder-decoder.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources