VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation
CoRR(2024)
摘要
Human image animation involves generating a video from a static image by
following a specified pose sequence. Current approaches typically adopt a
multi-stage pipeline that separately learns appearance and motion, which often
leads to appearance degradation and temporal inconsistencies. To address these
issues, we propose VividPose, an innovative end-to-end pipeline based on Stable
Video Diffusion (SVD) that ensures superior temporal stability. To enhance the
retention of human identity, we propose an identity-aware appearance controller
that integrates additional facial information without compromising other
appearance details such as clothing texture and background. This approach
ensures that the generated videos maintain high fidelity to the identity of
human subject, preserving key facial features across various poses. To
accommodate diverse human body shapes and hand movements, we introduce a
geometry-aware pose controller that utilizes both dense rendering maps from
SMPL-X and sparse skeleton maps. This enables accurate alignment of pose and
shape in the generated videos, providing a robust framework capable of handling
a wide range of body shapes and dynamic hand movements. Extensive qualitative
and quantitative experiments on the UBCFashion and TikTok benchmarks
demonstrate that our method achieves state-of-the-art performance. Furthermore,
VividPose exhibits superior generalization capabilities on our proposed
in-the-wild dataset. Codes and models will be available.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要