Start Over

Word–Sentence Framework for Remote Sensing Image Captioning.

Authors :: Wang, Qi
Huang, Wei
Zhang, Xueting
Li, Xuelong
Source :: IEEE Transactions on Geoscience & Remote Sensing; Dec2021, Vol. 59 Issue 12, p10532-10543, 12p
Publication Year :: 2021
Abstract: Remote sensing image captioning (RSIC), which aims at generating a well-formed sentence for a remote sensing image, has attracted more attention in recent years. The general framework for RSIC is the encoder–decoder architecture containing two submodels of encoder and decoder. Although the significant performance is obtained, the encoder–decoder architecture is a black-box model with a lack of explainability. To overcome this drawback, in this article, we propose a new explainable word–sentence framework for RSIC. The proposed word–sentence framework consists of two parts: word extractor and sentence generator, where the former extracts the valuable words in the given remote sensing image, while the latter organizes these words into a well-formed sentence. The proposed framework decomposes RSIC into a word classification task and a word sorting task, which is more in line with human intuitive understanding. On the basis of the word–sentence framework, some ablation experiments are conducted on the three public RSIC data sets of Sydney-captions, UCM-captions, and RSICD to explore the specific and effective network structures. In order to evaluate the proposed word–sentence framework objectively, we further conduct some comparative experiments on these three data sets and achieve comparable results in comparison with the encoder–decoder-based methods. [ABSTRACT FROM AUTHOR]