Recursive Visual Programming
CoRR(2023)
摘要
Visual Programming (VP) has emerged as a powerful framework for Visual
Question Answering (VQA). By generating and executing bespoke code for each
question, these methods demonstrate impressive compositional and reasoning
capabilities, especially in few-shot and zero-shot scenarios. However, existing
VP methods generate all code in a single function, resulting in code that is
suboptimal in terms of both accuracy and interpretability. Inspired by human
coding practices, we propose Recursive Visual Programming (RVP), which
simplifies generated routines, provides more efficient problem solving, and can
manage more complex data structures. RVP is inspired by human coding practices
and approaches VQA tasks with an iterative recursive code generation approach,
allowing decomposition of complicated problems into smaller parts. Notably, RVP
is capable of dynamic type assignment, i.e., as the system recursively
generates a new piece of code, it autonomously determines the appropriate
return type and crafts the requisite code to generate that output. We show
RVP's efficacy through extensive experiments on benchmarks including VSR, COVR,
GQA, and NextQA, underscoring the value of adopting human-like recursive and
modular programming techniques for solving VQA tasks through coding.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要