A Contrastive Compositional Benchmark for Text-to-Image Synthesis: A Study with Unified Text-to-Image Fidelity Metrics
CoRR(2023)
摘要
Text-to-image (T2I) synthesis has recently achieved significant advancements.
However, challenges remain in the model's compositionality, which is the
ability to create new combinations from known components. We introduce
Winoground-T2I, a benchmark designed to evaluate the compositionality of T2I
models. This benchmark includes 11K complex, high-quality contrastive sentence
pairs spanning 20 categories. These contrastive sentence pairs with subtle
differences enable fine-grained evaluations of T2I synthesis models.
Additionally, to address the inconsistency across different metrics, we propose
a strategy that evaluates the reliability of various metrics by using
comparative sentence pairs. We use Winoground-T2I with a dual objective: to
evaluate the performance of T2I models and the metrics used for their
evaluation. Finally, we provide insights into the strengths and weaknesses of
these metrics and the capabilities of current T2I models in tackling challenges
across a range of complex compositional categories. Our benchmark is publicly
available at https://github.com/zhuxiangru/Winoground-T2I .
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要