Improving Multiparty Interactions with a Robot Using Large Language Models

CHI Extended Abstracts(2023)

引用 7|浏览80
暂无评分
摘要
Speaker diarization is a key component of systems that support multiparty interactions of co-located users, such as meeting facilitation robots. The goal is to identify who spoke what, often to provide feedback, moderate participation, and personalize responses by the robot. Current systems use a combination of acoustic (e.g. pitch differences) and visual features (e.g. gaze) to perform diarization, but involve the use of additional sensors or require overhead signal processing efforts. Alternatively, automatic speech recognition (ASR) is a necessary step in the diarization pipeline, and utilizing the transcribed text to directly identify speaker labels in the conversation can eliminate such challenges. With that motivation, we leverage large language models (LLMs) to identify speaker labels from transcribed text and observe an exact match of 77% and a word level accuracy of 90%. We discuss our findings and the potential use of LLMs as a diarization tool for future systems.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要