Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
arxiv(2024)
Abstract
Recent advancements in diffusion models, particularly the trend of
architectural transformation from UNet-based Diffusion to Diffusion Transformer
(DiT), have significantly improved the quality and scalability of image
synthesis. Despite the incredible generative quality, the large computational
requirements of these large-scale models significantly hinder the deployments
in real-world scenarios. Post-training Quantization (PTQ) offers a promising
solution by compressing model sizes and speeding up inference for the
pretrained models while eliminating model retraining. However, we have observed
the existing PTQ frameworks exclusively designed for both ViT and conventional
Diffusion models fall into biased quantization and result in remarkable
performance degradation. In this paper, we find that the DiTs typically exhibit
considerable variance in terms of both weight and activation, which easily runs
out of the limited numerical representations. To address this issue, we devise
Q-DiT, which seamlessly integrates three techniques: fine-grained quantization
to manage substantial variance across input channels of weights and
activations, an automatic search strategy to optimize the quantization
granularity and mitigate redundancies, and dynamic activation quantization to
capture the activation changes across timesteps. Extensive experiments on the
ImageNet dataset demonstrate the effectiveness of the proposed Q-DiT.
Specifically, when quantizing DiT-XL/2 to W8A8 on ImageNet 256x256, Q-DiT
achieves a remarkable reduction in FID by 1.26 compared to the baseline. Under
a W4A8 setting, it maintains high fidelity in image generation, showcasing only
a marginal increase in FID and setting a new benchmark for efficient,
high-quality quantization in diffusion transformers. Code is available at
\href{https://github.com/Juanerx/Q-DiT}{https://github.com/Juanerx/Q-DiT}.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined