i"/>

Appearance-Agnostic Representation Learning for Compositional Action Recognition

IEEE Transactions on Circuits and Systems for Video Technology(2024)

引用 0|浏览4
暂无评分
摘要
The discussion of compositional generalization in action recognition, i.e ., Compositional Action Recognition (CAR), has recently received increasing attention. CAR challenges models to recognize unseen combinations of actions and objects, with the primary challenge being the distribution shift from training to testing. Most previous approaches for CAR incorporate supplementary object annotations ( e.g . bounding boxes and objects categories) to learn an instance-centric dynamic representation. However, these methods inevitably introduce stronger visual inductive bias, including object appearance and background bias, that impact generalization performance, particularly in out-of-distribution scenarios. To this end, this work attempts to construct an appearance-agnostic de-biased representation by leveraging the powerful segmentation capability of Segment Anything Model (SAM), which is the first exploration of SAM in the field of compositional action recognition. Specifically, we propose a novel SAM-driven Appearance-Agnostic Representation Learning (A 2 RL) framework for CAR, which contains two effective sub-modules: Fore-Back Mask (FBM) and Dynamic Relation Modeling (DRM). In FBM, we design a fine-grained instance-invisible and background-removed masking strategy to effectively weaken the strong connection between visual cues and action labels, as well as minimize the impact of irrelevant factors. In DRM, we explore the potential association between subjects and objects involved in one action and then build appearance-agnostic relational descriptors for dynamic modeling. Extensive experiments demonstrate the generalization ability of this work. Notably, FBM achieves significant improvements in all three compositional settings without adding any additional model parameters. The proposed also gains state-of-the-art performance in comparison with the most recent methods in CAR.
更多
查看译文
关键词
Compositional Action Recognition,Segment Anything Model,De-biased
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要