Locally Hierarchical Auto-Regressive Modeling for Image Generation

NeurIPS 2022(2022)

引用 3|浏览14
暂无评分
摘要
We propose a locally hierarchical auto-regressive model with multiple resolutions of discrete codes. Our model represents an image with a pyramid of codes via Hierarchically Quantized Variational AutoEncoder (HQ-VAE) in the first stage, and disentangles the information contained in the multi-level codes. For an example of two-level codes, we create two separate pathways to carry high-level coarse structures of input images using top codes while compensating for missing fine details by constructing a residual connection for bottom codes. An appropriate selection of resizing operations for code embedding maps enables top codes to capture maximal information within images and the first stage algorithm achieves better performance on both vector quantization and image generation. Hierarchically Quantized Transformer (HQ-Transformer) in the second stage processes a sequence of local pyramids, which consist of a single top code and its corresponding bottom codes. Contrary to other hierarchical models, we sample bottom codes in parallel by exploiting the conditional independence assumption on the bottom codes. This assumption is naturally harvested from our first-stage model, HQ-VAE, where the bottom code learns to describe local details. On class-conditional and text-conditional generation benchmarks, our model shows competitive performance to previous AR models in terms of fidelity of generated images while enjoying lighter computational budgets.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要