Remote Sensing Image Generation From Audio.

Authors :: Zheng, Zhiyuan
Chen, Jun
Zheng, Xiangtao
Lu, Xiaoqiang
Source :: IEEE Geoscience & Remote Sensing Letters; Jun2021, Vol. 18 Issue 6, p994-998, 5p
Publication Year :: 2021
Abstract: Generating image from other modal data has attracted much attention in cross-modal studies, since the generated image offers intuitive vision information. Unlike the previous works which generate an image from text, a novel task is introduced, generating an image from audio. However, semantic gap intrinsically exists in cross-modal data, which disturbs the generative results. In order to explore the relevance between the audio and image, a novel reranking audio-image translation method is proposed. The proposed method: 1) maps the audio and image into a uniform feature space; 2) designs an audio-audio matching network to match the related audio; and 3) adopts an audio-image matching network for every matched audio to generate a related image, and the most frequent image is voted as the final result. Extensive experiments on two remote sensing cross-modal data sets demonstrate that the proposed method can visualize the content of audio. [ABSTRACT FROM AUTHOR]