谷歌浏览器插件
订阅小程序
在清言上使用

Com-STAL: Compositional Spatio-Temporal Action Localization

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY(2023)

引用 0|浏览9
暂无评分
摘要
Spatio-temporal action localization aims to locate the spatial and temporal positions of actors and classify their actions. However, prior research overlooks the fact that human actions often interact with novel objects in real-world scenarios, which neglects the various combinations of action-object, and considerably limits the generalization of the developed models. In this paper, we study the action-object combinations by researching multi-modal vision information of them. To this end, we propose a novel compositional spatio-temporal action localization (Com-STAL) task, which features non-overlapping action-object combinations in their training and test sets. Based on this, we construct a compositional action localization dataset (Com-AD). Beyond that, we propose a simple yet effective framework, Instance-Centric Interaction Network (ICIN), to reduce invalid induction biases within the visual modality and alleviate the combined distribution bias issue by leveraging additional modal information. The extensive experiment results on Com-AD demonstrate superior action localization performance of ICIN.
更多
查看译文
关键词
Spatio-temporal action localization,compositional spatio-temporal action localization,induction bias,combined distribution bias
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要