Chrome Extension
WeChat Mini Program
Use on ChatGLM

Align, Adapt and Inject: Audio-Guided Image Generation, Editing and Stylization.

IEEE International Conference on Acoustics, Speech, and Signal Processing(2024)

Cited 0|Views8
No score
Abstract
Diffusion models have significantly advanced various image generative tasks, including image generation, editing, and stylization. While text prompts are commonly used as guidance in most generative models, audio presents a valuable alternative, as it inherently accompanies corresponding scenes and provides abundant information for guiding image generative tasks. In this paper, we propose a novel and unified framework named Align, Adapt, and Inject (AAI) to explore the cue role of audio, which effectively realizes audio-guided image generation, editing, and stylization simultaneously. Specifically, AAI first aligns the audio embedding with visual features, and then adapts the aligned audio embedding to an AudioCue enriched with visual semantics, finally injects the AudioCue into existing Text-to-Image diffusion model in a plug-and-play manner. The experiment results demonstrate that AAI successfully extracts rich information from audio, and outperforms previous work in multiple image generative tasks.
More
Translated text
Key words
Audio-guided image generative tasks,diffusion model,multi-modality alignment
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined