Chrome Extension
WeChat Mini Program
Use on ChatGLM

ImageVista: Training-Free Text-to-Image Generation with Multilingual Input Text

Shamina Kaushar, Yash Agarwal, Anirban Saha, Dipanjan Pramanik, Nabanita Das,Bikash Sadhukhan

2024 2nd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT)(2024)

Cited 0|Views0
No score
Abstract
This study presents a text-to-image generation approach employing the contrastive language-image pretraining (CLIP) + generative adversarial network (GAN) paradigm, focusing on optimizing semantic relevance within a pretrained GAN's latent space. The method facilitates the generation of zero-shot models and allows for modifications using various generators. To address the challenges in CLIP score optimization within the GAN domain, this study introduces the FuseDream pipeline. This pipeline elevates image quality through the AugCLIP score, an optimization strategy for efficiently navigating nonconvex landscapes, and a composite generation technique for mitigating data bias. FuseDream produces high-quality images featuring diverse objects, backgrounds, and artistic styles based on textual prompts. Notably, it achieves top-tier Inception and FID scores on the MS COCO dataset, indicating superior performance. Code modifications are implemented to enhance FuseDream's efficacy in te xt-to-i mage synthesis, contributing to its versatility and robustness. Overall, the study demonstrates the effectiveness of the proposed approach, demonstrating advancements in text-to-image synthesis, particularly in overcoming challenges related to CLIP score optimization within the GAN framework. The FuseDream pipeline has emerged as a comprehensive solution that combines optimization strategies and diverse generation techniques to achieve remarkable results in image synthesis.
More
Translated text
Key words
Generative Adversarial Network (GAN),Contrastive Language-Image Pretraining (CLIP),Augmented Contrastive Language-Image Pretraining (AugCLIP),Frechet Inception Distance (FID),Inception Score (IS),Vector Quantized Generative Adversarial Network (VQGAN)
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined