Conan's Bow Tie: A Streaming Voice Conversion for Real-Time VTuber Livestreaming.

International Conference on Intelligent User Interfaces(2024)

引用 0|浏览14
暂无评分
摘要
Recent years have witnessed a dramatic growing trend of Virtual YouTubers (VTubers) as a new business on social media, such as YouTube, Twitch, and TikTok. However, a significant challenge arises when VTuber voice actors face health issues or retire, jeopardizing the continuity of their avatar’s recognizable voices. A potential solution reminiscent of Conan’s Bow Tie voice changer in the popular animation Case Closed (i.e., Detective Conan) has inspired our work. To make this a reality, we introduce VTuberBowTie, a user-friendly streaming voice conversion system for real-time VTuber livestreaming. We propose an innovative streaming voice conversion approach that tackles the challenges of limited context modeling and bidirectional context dependence inherent to conventional real-time voice conversion. Rather than individually processing the voice stream in data chunks, our approach adopts a fully sequential structure that leverages contextual information preceding the input chunk, thereby expanding the perceptual range and enabling seamless concatenation. Moreover, we developed a ready-to-use interaction interface for VTuberBowTie and deployed it on various computing platforms. The experimental results show that VTuberBowTie can achieve high-quality voice conversion in a streaming manner with a latency of 179.1ms on CPU and 70.8ms on GPU while providing users a friendly interactive experience.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要