Learning to Learn Faster from Human Feedback with Language Model Predictive Control
CoRR(2024)
摘要
Large language models (LLMs) have been shown to exhibit a wide range of
capabilities, such as writing robot code from language commands – enabling
non-experts to direct robot behaviors, modify them based on feedback, or
compose them to perform new tasks. However, these capabilities (driven by
in-context learning) are limited to short-term interactions, where users'
feedback remains relevant for only as long as it fits within the context size
of the LLM, and can be forgotten over longer interactions. In this work, we
investigate fine-tuning the robot code-writing LLMs, to remember their
in-context interactions and improve their teachability i.e., how efficiently
they adapt to human inputs (measured by average number of corrections before
the user considers the task successful). Our key observation is that when
human-robot interactions are formulated as a partially observable Markov
decision process (in which human language inputs are observations, and robot
code outputs are actions), then training an LLM to complete previous
interactions can be viewed as training a transition dynamics model – that can
be combined with classic robotics techniques such as model predictive control
(MPC) to discover shorter paths to success. This gives rise to Language Model
Predictive Control (LMPC), a framework that fine-tunes PaLM 2 to improve its
teachability on 78 tasks across 5 robot embodiments – improving non-expert
teaching success rates of unseen tasks by 26.9
number of human corrections from 2.4 to 1.9. Experiments show that LMPC also
produces strong meta-learners, improving the success rate of in-context
learning new tasks on unseen robot embodiments and APIs by 31.5
code, and demos at: https://robot-teaching.github.io/.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要