Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts
CoRR(2024)
摘要
Diffusion models have achieved remarkable success across a range of
generative tasks. Recent efforts to enhance diffusion model architectures have
reimagined them as a form of multi-task learning, where each task corresponds
to a denoising task at a specific noise level. While these efforts have focused
on parameter isolation and task routing, they fall short of capturing detailed
inter-task relationships and risk losing semantic information, respectively. In
response, we introduce Switch Diffusion Transformer (Switch-DiT), which
establishes inter-task relationships between conflicting tasks without
compromising semantic information. To achieve this, we employ a sparse
mixture-of-experts within each transformer block to utilize semantic
information and facilitate handling conflicts in tasks through parameter
isolation. Additionally, we propose a diffusion prior loss, encouraging similar
tasks to share their denoising paths while isolating conflicting ones. Through
these, each transformer block contains a shared expert across all tasks, where
the common and task-specific denoising paths enable the diffusion model to
construct its beneficial way of synergizing denoising tasks. Extensive
experiments validate the effectiveness of our approach in improving both image
quality and convergence rate, and further analysis demonstrates that Switch-DiT
constructs tailored denoising paths across various generation scenarios.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要