Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks
arxiv(2024)
摘要
While LLMs excel in processing text in these human conversations, they
struggle with the nuances of verbal instructions in scenarios like social
navigation, where ambiguity and uncertainty can erode trust in robotic and
other AI systems. We can address this shortcoming by moving beyond text and
additionally focusing on the paralinguistic features of these audio responses.
These features are the aspects of spoken communication that do not involve the
literal wording (lexical content) but convey meaning and nuance through how
something is said. We present Beyond Text; an approach that improves LLM
decision-making by integrating audio transcription along with a subsection of
these features, which focus on the affect and more relevant in human-robot
conversations.This approach not only achieves a 70.26% winning rate,
outperforming existing LLMs by 22.16% to 48.30% (gemini-1.5-pro and gpt-3.5
respectively), but also enhances robustness against token manipulation
adversarial attacks, highlighted by a 22.44% less decrease ratio than the
text-only language model in winning rate. “Beyond Text” marks an
advancement in social robot navigation and broader Human-Robot interactions,
seamlessly integrating text-based guidance with human-audio-informed language
models.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要