Pose-aware video action segmentation

Meijing Zhang, Chenyang Liao, Qi Li, Hua Zhang,Wenxi Liu

Neural Computing and Applications(2024)

Cited 0|Views1
No score
Abstract
Action segmentation is an emerging task in video understanding, particularly for untrimmed videos containing multiple actions. However, existing video-based methods may struggle due to their sensitivity to visual factors, while skeleton-based methods may not capture sufficient information from human poses to accurately segment actions. To overcome this limitation, we propose a novel approach that leverages the complementary information of video and human poses synergistically for action segmentation. To the best of our knowledge, this is the first attempt to exploit the complementarity of video and poses for this task. Specifically, we introduce a cross-modal salient sampling module that attentively integrates human pose information with temporal visual features for action segmentation across modalities. Our approach achieves state-of-the-art performance on two benchmarks, demonstrating the efficacy of our method in leveraging both visual and pose information for action segmentation.
More
Translated text
Key words
Action segmentation,Video understanding,Pose estimation
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined