Online Hierarchical Linking of Action Tubes for Spatio-Temporal Action Detection Based on Multiple Clues

Shaowen Su,Yan Zhang

IEEE ACCESS(2024)

引用 0|浏览0
暂无评分
摘要
The spatio-temporal action detection task requires the output of the temporal and spatial positions as well as the action category of the target action instances in the form of action tubes. However, the current definition of video-level metrics in spatio-temporal action detection tasks is not sufficiently clear and unified to fully describe the ability of network models to perform spatio-temporal detection. Furthermore, existing tube linking methods are not only heavily dependent on the quality of the detection stage but also lack reliable linking criteria, resulting in poor tube linking performance. To address these issues, this study proposes a hierarchical linking method based on multiple clues. This method first dynamically utilizes various correlation clues at two levels, including appearance features, spatial overlap, motion prediction, category scores, tube length, and tube confidence status, to reduce the negative impact of unreliable information on the correlation. Then, it employs inter-class correlation to handle the mutual influence between different categories, followed by joint probability data association to address the mutual influence between correlated objects, ultimately achieving robust and accurate online linking of action tubes. The method is experimentally compared with other correlation methods on the untrimmed UCF24 and MultiSports datasets, demonstrating state-of-the-art tube link performance. We also conducted ablation experiments to explore the impact of the different modules and stages in the proposed tube-linking method.
更多
查看译文
关键词
MCHL,spatio-temporal action detection,linking method,untrimmed video mAP
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要