Human Video Translation via Query Warping
CoRR(2024)
摘要
In this paper, we present QueryWarp, a novel framework for temporally
coherent human motion video translation. Existing diffusion-based video editing
approaches that rely solely on key and value tokens to ensure temporal
consistency, which scarifies the preservation of local and structural regions.
In contrast, we aim to consider complementary query priors by constructing the
temporal correlations among query tokens from different frames. Initially, we
extract appearance flows from source poses to capture continuous human
foreground motion. Subsequently, during the denoising process of the diffusion
model, we employ appearance flows to warp the previous frame's query token,
aligning it with the current frame's query. This query warping imposes explicit
constraints on the outputs of self-attention layers, effectively guaranteeing
temporally coherent translation. We perform experiments on various human motion
video translation tasks, and the results demonstrate that our QueryWarp
framework surpasses state-of-the-art methods both qualitatively and
quantitatively.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要