DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions
arxiv(2024)
摘要
Generating natural hand-object interactions in 3D is challenging as the
resulting hand and object motions are expected to be physically plausible and
semantically meaningful. Furthermore, generalization to unseen objects is
hindered by the limited scale of available hand-object interaction datasets. We
propose DiffH2O, a novel method to synthesize realistic, one or two-handed
object interactions from provided text prompts and geometry of the object. The
method introduces three techniques that enable effective learning from limited
data. First, we decompose the task into a grasping stage and a text-based
interaction stage and use separate diffusion models for each. In the grasping
stage, the model only generates hand motions, whereas in the interaction phase
both hand and object poses are synthesized. Second, we propose a compact
representation that tightly couples hand and object poses. Third, we propose
two different guidance schemes to allow more control of the generated motions:
grasp guidance and detailed textual guidance. Grasp guidance takes a single
target grasping pose and guides the diffusion model to reach this grasp at the
end of the grasping stage, which provides control over the grasping pose. Given
a grasping motion from this stage, multiple different actions can be prompted
in the interaction phase. For textual guidance, we contribute comprehensive
text descriptions to the GRAB dataset and show that they enable our method to
have more fine-grained control over hand-object interactions. Our quantitative
and qualitative evaluation demonstrates that the proposed method outperforms
baseline methods and leads to natural hand-object motions. Moreover, we
demonstrate the practicality of our framework by utilizing a hand pose estimate
from an off-the-shelf pose estimator for guidance, and then sampling multiple
different actions in the interaction stage.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要