TranSkeleton: Hierarchical Spatial-Temporal Transformer for Skeleton-Based Action Recognition

IEEE Transactions on Circuits and Systems for Video Technology(2023)

引用 2|浏览29
暂无评分
摘要
In skeleton-based action recognition, it has been a dominant paradigm to extract motion features with temporal convolution and model spatial correlations with graph convolution. However, it's difficult for temporal convolution to capture long-range dependencies effectively. Meanwhile, commonly used multi-branch graph convolution leads to high complexity. In this paper, we propose TranSkeleton, a powerful Transformer framework which neatly unifies the spatial and temporal modeling of skeleton sequences. For temporal modeling, we propose a novel partition-aggregation temporal Transformer. It works with hierarchical temporal partition and aggregation, and can capture both long-range dependencies and subtle temporal structures effectively. A difference-aware aggregation approach is designed to reduce information loss during temporal aggregation. For spatial modeling, we propose a topology-aware spatial Transformer which utilizes the prior information of human body topology to facilitate spatial correlation modeling. Extensive experiments on two challenging benchmark datasets demonstrate that TranSkeleton notably outperforms the state of the arts.
更多
查看译文
关键词
Skeleton-based action recognition,spatial–temporal transformer,long-range temporal dependencies
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要