Actor-Transformers for Group Activity Recognition

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2020)

引用 196|浏览237
暂无评分
摘要
This paper strives to recognize individual actions and group activities from videos. While existing solutions for this challenging problem explicitly model spatial and temporal relationships based on location of individual actors, we propose an actor-transformer model able to learn and selectively extract information relevant for group activity recognition. We feed the transformer with rich actor-specific static and dynamic representations expressed by features from a 2D pose network and 3D CNN, respectively. We empirically study different ways to combine these representations and show their complementary benefits. Experiments show what is important to transform and how it should be transformed. What is more, actor-transformers achieve state-of-the-art results on two publicly available benchmarks for group activity recognition, outperforming the previous best published results by a considerable margin.
更多
查看译文
关键词
group activity recognition,temporal relationships,actor-transformer model,spatial relationships,actor-specific static representation,actor-specific dynamic representation,2D pose network,3D CNN,convolutional neural nets
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要