Active Spatial Positions Based Hierarchical Relation Inference for Group Activity Recognition

IEEE Transactions on Circuits and Systems for Video Technology(2023)

引用 2|浏览28
暂无评分
摘要
Group activity recognition aims to recognize behaviors characterized by multiple individuals within a scene. Existing schemes rely on individual relation inference and usually take the individuals as tokens. Essentially they select the most relevant region of the group activity from the entire image while filtering out irrelevant background noises. However, these schemes require individual bounding box labeling in both training and testing stages. Since individuals have usually been presented at one scale, multi-scale individuals cannot be combined in an effective way. In this paper, we present a novel end-to-end hierarchical relation inference framework based on active spatial positions for group activity recognition. This framework is designed to locate active spatial positions and use them as visual tokens to infer the relations for token embeddings. It requires individual bounding box labeling only in the training stage while automatically eliminating the background after locating active spatial positions from the entire scene. The hierarchical relations can be naturally inferred based on the visual tokens at different scales, contributing to further performance improvement. Experimental results demonstrate that the proposed framework is competitive against existing schemes that require more laboring and computation to generate labels in both the training and testing stage.
更多
查看译文
关键词
Group activity recognition,active spatial positions,hierarchical relation inference
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要