Back to Search Start Over

Image Captioning using Pretrained Language Models and Image Segmentation

Authors :
Bianco, S
Ferrario, G
Napoletano, P
Bianco S.
Ferrario G.
Napoletano P.
Bianco, S
Ferrario, G
Napoletano, P
Bianco S.
Ferrario G.
Napoletano P.
Publication Year :
2022

Abstract

Large-scale pre-trained language models, which have learned cross-modal representations on image-text pairs, are becoming popular for vision-language tasks because the fine-tuning to a specific task enables state-of-the-art results. Existing methods require features of image regions as input, but these regions are extracted with an object detection model that does not handle overlapping, noisy and ambiguous regions; this inevitably results in less meaningful features. In this paper we propose a new way to extract region features based on image segmentation, with the goal of reducing overlapping and noise. Our method is motivated by the observation that image segmentation can remove useless pixels using the binary mask to extract only the object of interest.

Details

Database :
OAIster
Notes :
English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1355266792
Document Type :
Electronic Resource