Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers
CoRR(2024)
摘要
3D human pose estimation captures the human joint points in three-dimensional
space while keeping the depth information and physical structure. That is
essential for applications that require precise pose information, such as
human-computer interaction, scene understanding, and rehabilitation training.
Due to the challenges in data collection, mainstream datasets of 3D human pose
estimation are primarily composed of multi-view video data collected in
laboratory environments, which contains rich spatial-temporal correlation
information besides the image frame content. Given the remarkable
self-attention mechanism of transformers, capable of capturing the
spatial-temporal correlation from multi-view video datasets, we propose a
multi-stage framework for 3D sequence-to-sequence (seq2seq) human pose
detection. Firstly, the spatial module represents the human pose feature by
intra-image content, while the frame-image relation module extracts temporal
relationships and 3D spatial positional relationship features between the
multi-perspective images. Secondly, the self-attention mechanism is adopted to
eliminate the interference from non-human body parts and reduce computing
resources. Our method is evaluated on Human3.6M, a popular 3D human pose
detection dataset. Experimental results demonstrate that our approach achieves
state-of-the-art performance on this dataset.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要