Hierarchical Spatial-Temporal Transformer with Motion Trajectory for Individual Action and Group Activity Recognition

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览7
暂无评分
摘要
Group activity recognition, which aims to simultaneously understand individual action and group activity in video clips, plays a fundamental role in video analysis. In this paper, we propose a novel reasoning network, Hierarchical Spatial-Temporal Transformer termed HSTT, for individual action and group activity recognition, which focuses on capturing the various degrees of spatial-temporal dynamic interactions adaptively and jointly among actors. Specifically, we first design a hierarchical spatial-temporal Transformer by capturing different levels of relationships to deal with unequal interaction relationships among actors. Furthermore, our proposed spatial-temporal Transformer (STT) block is capable of fully mining long-range spatial-temporal interactions with the virtue of the merge function and cross attention mechanism. Besides, we adopt the motion trajectory branch to provide complementary dynamic features for improving recognition performance. Extensive experiments on the two public GAR datasets clearly show that our approach can achieve very competitive performance by comparing them with state-of-the-art works.
更多
查看译文
关键词
Group activity recognition,motion trajectories,spatial-temporal Transformer,graph neural network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要