Visual dubbing pipeline with localized lip-sync and two-pass identity transfer

Dhyey Patel,Houssem Zouaghi,Sudhir Mudur,Eric Paquette, Serge Laforest, Martin Rouillard,Tiberiu Popa

Computers & Graphics(2022)

引用 5|浏览7
暂无评分
摘要
Visual dubbing uses visual computing and deep learning to alter the lip and mouth articulations of the actor to sync with the dubbed speech. It has the potential to greatly improve the content generated from the dubbing industry. Quality of the dubbed result is primary for the industry. An important requirement is that visual lip sync changes be localized to the mouth region and not affect the rest of the actor’s face or the rest of the video frame. Current methods can create realistic looking fake faces with expressions. However, many fail to localize lip sync and have quality problems such as identity loss, low-res, blurs, face skin feature or color loss, and temporal jitter. These problems mainly arise because end-to-end training of networks to correctly disentangle these different visual dubbing parameters (pose, skin color, identity, lip movements, etc.) is very difficult to achieve. Our main contribution is a new visual dubbing pipeline, in which, instead of end-to-end training we apply incrementally different disentangling techniques for each parameter. Our pipeline is composed of three main steps: pose alignment, identity transfer and video reassembly. Expert models in each step are fine-tuned for the actor. We propose an identity transfer network with an added style block, which with pre-training is able to decouple face components, specifically identity and expression, and also works with short video clips like TV ads. Our pipeline also includes novel stages related to temporal smoothing of the reenacted face, actor specific super resolution to retain fine facial details, and a second pass through the identity transfer network for preserving actor identity. Localization of lip-sync is achieved by restricting changes in the original video frame to just the actor’s mouth region. The results are convincing, and a user survey also confirms their quality. Relevant quantitative metrics are included.
更多
查看译文
关键词
Visual dubbing,Reenactment,Style transfer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要