Back to Search
Start Over
Locally controllable network based on visual–linguistic relation alignment for text-to-image generation.
- Source :
-
Multimedia Systems . Feb2024, Vol. 30 Issue 1, p1-13. 13p. - Publication Year :
- 2024
-
Abstract
- Since locally controllable text-to-image generation cannot achieve satisfactory results in detail, a novel locally controllable text-to-image generation network based on visual–linguistic relation alignment is proposed. The goal of the method is to complete image processing and generation semantically through text guidance. The proposed method explores the relationship between text and image to achieve local control of text-to-image generation. The visual–linguistic matching learns the similarity weights between image and text through semantic features to achieve the fine-grained correspondence between local images and words. The instance-level optimization function is introduced into the generation process to accurately control the weight with low similarity and combine with text features to generate new visual attributes. In addition, a local control loss is proposed to preserve the details of the text and local regions of the image. Extensive experiments demonstrate the superior performance of the proposed method and enable more accurate control of the original image. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 09424962
- Volume :
- 30
- Issue :
- 1
- Database :
- Academic Search Index
- Journal :
- Multimedia Systems
- Publication Type :
- Academic Journal
- Accession number :
- 174903626
- Full Text :
- https://doi.org/10.1007/s00530-023-01222-7