Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
arxiv(2024)
摘要
Diffusion models have shown remarkable performance in image generation in
recent years. However, due to a quadratic increase in memory during generating
ultra-high-resolution images (e.g. 4096*4096), the resolution of generated
images is often limited to 1024*1024. In this work. we propose a unidirectional
block attention mechanism that can adaptively adjust the memory overhead during
the inference process and handle global dependencies. Building on this module,
we adopt the DiT structure for upsampling and develop an infinite
super-resolution model capable of upsampling images of various shapes and
resolutions. Comprehensive experiments show that our model achieves SOTA
performance in generating ultra-high-resolution images in both machine and
human evaluation. Compared to commonly used UNet structures, our model can save
more than 5x memory when generating 4096*4096 images. The project URL is
https://github.com/THUDM/Inf-DiT.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要