Vision-Language Interpreter for Robot Task Planning.
CoRR(2023)
摘要
Large language models (LLMs) are accelerating the development of
language-guided robot planners. Meanwhile, symbolic planners offer the
advantage of interpretability. This paper proposes a new task that bridges
these two trends, namely, multimodal planning problem specification. The aim is
to generate a problem description (PD), a machine-readable file used by the
planners to find a plan. By generating PDs from language instruction and scene
observation, we can drive symbolic planners in a language-guided framework. We
propose a Vision-Language Interpreter (ViLaIn), a new framework that generates
PDs using state-of-the-art LLM and vision-language models. ViLaIn can refine
generated PDs via error message feedback from the symbolic planner. Our aim is
to answer the question: How accurately can ViLaIn and the symbolic planner
generate valid robot plans? To evaluate ViLaIn, we introduce a novel dataset
called the problem description generation (ProDG) dataset. The framework is
evaluated with four new evaluation metrics. Experimental results show that
ViLaIn can generate syntactically correct problems with more than 99% accuracy
and valid plans with more than 58% accuracy.
更多查看译文
关键词
robot,planning,vision-language
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要