Gesture Generation with Diffusion Models Aided by Speech Activity Information

ICMI '23 Companion: Companion Publication of the 25th International Conference on Multimodal Interaction(2023)

引用 0|浏览3
暂无评分
摘要
This paper describes a gesture generation model based on state-of-the-art diffusion models. Novel adaptations were introduced to improve motion appropriateness relative to speech and human-likeness. Specifically, the main focus was to enhance gesture responsiveness to speech audio. We explored using a pre-trained Voice Activity Detector (VAD) to obtain more meaningful audio representations. The proposed model was submitted to the GENEA Challenge 2023. Perceptual experiments compared our model, labeled SH, with other submissions to the challenge. The results indicated that our model achieved competitive levels of human-likeness. While appropriateness to the agent’s speech score was lower than most entries, there were no statistically significant differences from most models at the confidence level.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要