De-fine: Decomposing and Refining Visual Programs with Auto-Feedback.
CoRR(2023)
摘要
Visual programming, a modular and generalizable paradigm, integrates
different modules and Python operators to solve various vision-language tasks.
Unlike end-to-end models that need task-specific data, it advances in
performing visual processing and reasoning in an unsupervised manner. Current
visual programming methods generate programs in a single pass for each task
where the ability to evaluate and optimize based on feedback, unfortunately, is
lacking, which consequentially limits their effectiveness for complex,
multi-step problems. Drawing inspiration from benders decomposition, we
introduce De-fine, a general framework that automatically decomposes complex
tasks into simpler subtasks and refines programs through auto-feedback. This
model-agnostic approach can improve logical reasoning performance by
integrating the strengths of multiple models. Our experiments across various
visual tasks show that De-fine creates more accurate and robust programs,
setting new benchmarks in the field.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要