ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition
CoRR(2024)
摘要
Zero-shot action recognition (ZSAR) aims to learn an alignment model between
videos and class descriptions of seen actions that is transferable to unseen
actions. The text queries (class descriptions) used in existing ZSAR works,
however, are often short action names that fail to capture the rich semantics
in the videos, leading to misalignment. With the intuition that video content
descriptions (e.g., video captions) can provide rich contextual information of
visual concepts in videos, we propose to utilize human annotated video
descriptions to enrich the semantics of the class descriptions of each action.
However, all existing action video description datasets are limited in terms of
the number of actions, the semantics of video descriptions, etc. To this end,
we collect a large-scale action video descriptions dataset named ActionHub,
which covers a total of 1,211 common actions and provides 3.6 million action
video descriptions. With the proposed ActionHub dataset, we further propose a
novel Cross-modality and Cross-action Modeling (CoCo) framework for ZSAR, which
consists of a Dual Cross-modality Alignment module and a Cross-action
Invariance Mining module. Specifically, the Dual Cross-modality Alignment
module utilizes both action labels and video descriptions from ActionHub to
obtain rich class semantic features for feature alignment. The Cross-action
Invariance Mining module exploits a cycle-reconstruction process between the
class semantic feature spaces of seen actions and unseen actions, aiming to
guide the model to learn cross-action invariant representations. Extensive
experimental results demonstrate that our CoCo framework significantly
outperforms the state-of-the-art on three popular ZSAR benchmarks (i.e.,
Kinetics-ZSAR, UCF101 and HMDB51) under two different learning protocols in
ZSAR. We will release our code, models, and the proposed ActionHub dataset.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要