GMTalker: Gaussian Mixture based Emotional talking video Portraits
CoRR(2023)
摘要
Synthesizing high-fidelity and emotion-controllable talking video portraits,
with audio-lip sync, vivid expression, realistic head pose, and eye blink, is
an important and challenging task in recent years. Most of the existing methods
suffer in achieving personalized precise emotion control or continuously
interpolating between different emotions and generating diverse motion. To
address these problems, we present GMTalker, a Gaussian mixture based emotional
talking portraits generation framework. Specifically, we propose a Gaussian
Mixture based Expression Generator (GMEG) which can construct a continuous and
multi-modal latent space, achieving more flexible emotion manipulation.
Furthermore, we introduce a normalizing flow based motion generator pretrained
on the dataset with a wide-range motion to generate diverse motions. Finally,
we propose a personalized emotion-guided head generator with an Emotion Mapping
Network (EMN) which can synthesize high-fidelity and faithful emotional video
portraits. Both quantitative and qualitative experiments demonstrate our method
outperforms previous methods in image quality, photo-realism, emotion accuracy
and motion diversity.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要