What’s This? A Voice and Touch Multimodal Approach for Ambiguity Resolution in Voice Assistants

Multimodal Interfaces and Machine Learning for Multimodal Interaction(2021)

引用 4|浏览13
暂无评分
摘要
ABSTRACTHuman speech often contains ambiguity stemming from the use of demonstrative pronouns (DPs), such as “this” and “these.” While we can typically decipher which objects of interest DPs are referring to based on context, modern day voice assistants (VAs – such as Google Assistant and Siri) are yet unable to process queries containing such ambiguity. For instance, to humans, a question such as “how much is this?” can be clarified through visual reference (e.g., a buyer gestures to the seller the object they would like to purchase). To bridge this gap between human and machine cognition, we built and examined a touch + voice multimodal VA prototype that enables users to select key spatial information to embed as context and query the VA. The prototype converts results of mobile, real-time object recognition and optical character recognition models into augmented reality buttons that represent features. Users can interact with and modify the selected features through a word grid. We conducted a study to investigate: 1) how touch performs as an additional modality to resolve ambiguity in queries, 2) how users use DPs when interacting with VAs, and 3) how users perceive a VA that can understand DPs. From this procedure we found that as the query becomes more complex, users prefer the multimodal VA over the standard VA without experiencing elevated cognitive load. Additionally, even though it took some time getting used to, many participants eventually became comfortable with using DPs to interact with the multimodal VA and appreciated the improved human-likeness of human-VA conversations.
更多
查看译文
关键词
touch multimodal approach,ambiguity resolution,voice
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要