Chrome Extension
WeChat Mini Program
Use on ChatGLM

DMF-GAN: Deep Multimodal Fusion Generative Adversarial Networks for Text-to-Image Synthesis.

IEEE Trans. Multim.(2024)

Cited 0|Views31
No score
Abstract
Text-to-image synthesis aims to generate highquality realistic images conditioned on text description. The great challenge of this task depends on deeply and seamlessly integrating image and text information. Thus, in this paper, we propose a deep multimodal fusion generative adversarial networks (DMF-GAN) that allows effective semantic interactions for finegrained text-to-image generation. Specifically, through a novel recurrent semantic fusion network, DMF-GAN could consistently manipulate global assignment of text information among isolated fusion blocks. With the assistance of a multi-head attention module, DMF-GAN could model word information from different perspectives and further improve the semantic consistency. In addition, a word-level discriminator is proposed to provide the generator with fine-grained feedback related to each word. Compared with current state-of-the-art methods, our proposed DMFGAN could efficiently synthesize realistic and text-alignment images and achieve better performance on challenging benchmarks. The code link: https://github.com/xueqinxiang/DMF-GAN
More
Translated text
Key words
Deep multimodal fusion,generative adversarial network,text-to-image (T2I) synthesis
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined