Learn2Talk: 3D Talking Face Learns from 2D Talking Face
CoRR(2024)
摘要
Speech-driven facial animation methods usually contain two main classes, 3D
and 2D talking face, both of which attract considerable research attention in
recent years. However, to the best of our knowledge, the research on 3D talking
face does not go deeper as 2D talking face, in the aspect of
lip-synchronization (lip-sync) and speech perception. To mind the gap between
the two sub-fields, we propose a learning framework named Learn2Talk, which can
construct a better 3D talking face network by exploiting two expertise points
from the field of 2D talking face. Firstly, inspired by the audio-video sync
network, a 3D sync-lip expert model is devised for the pursuit of lip-sync
between audio and 3D facial motion. Secondly, a teacher model selected from 2D
talking face methods is used to guide the training of the audio-to-3D motions
regression network to yield more 3D vertex accuracy. Extensive experiments show
the advantages of the proposed framework in terms of lip-sync, vertex accuracy
and speech perception, compared with state-of-the-arts. Finally, we show two
applications of the proposed framework: audio-visual speech recognition and
speech-driven 3D Gaussian Splatting based avatar animation.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要