FACE-DUBBING plus plus : LIP-SYNCHRONOUS, VOICE PRESERVING TRANSLATION OF VIDEOS

ICASSP Workshops(2023)

引用 3|浏览31
暂无评分
摘要
In this paper, we propose a neural end-to-end system for voice preserving and lip-synchronous video translation. The system is designed to combine multiple component models and produces a video of the original speaker speaking in the target language that is lip-synchronous with the target speech, yet maintains emphases in speech, voice characteristics, and face video of the original speaker. The result is a video of a speaker speaking in another language without actually knowing it. For the evaluation, we present a user study of the complete system and separate evaluations of the single components. Since there is no available dataset to evaluate our whole system, we collect a test set to evaluate our system. The results indicate that our system is able to generate convincing videos of the original speaker speaking the target language while preserving the original speaker's characteristics.
更多
查看译文
关键词
end-to-end video translation,speech translation,text-to-speech,voice conversion,lip generation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要