A Snippets Relation and Hard-Snippets Mask Network for Weakly-Supervised Temporal Action Localization

IEEE Transactions on Circuits and Systems for Video Technology(2024)

引用 0|浏览10
暂无评分
摘要
Weakly-supervised temporal action localization (WTAL) is a problem learning an action localization model with only video-level labels available. In recent years, many WTAL methods have developed. However, hard-to-predict snippets near action boundaries are often not considered in these existing approaches, causing action incompleteness and action over-complete issues. To solve these issues, in this work, an end-to-end snippets relation and hard-snippets mask network (SRHN) is proposed. Specifically, a hard-snippets mask module is applied to mask the hard-to-predict snippets adaptively, and in this way, the trained model focuses more on those snippets with low uncertainty. Then, a snippets relation module is designed to capture the relationship among snippets and can make hard-to-predict snippets easy to predict by aggregating the information of multiple temporal receptive fields. Finally, a snippet enhancement loss is further developed to reduce the action probabilities that are not present in videos for hard-to-predict snippets and other snippets, enlarging the action probabilities that exist in videos. Extensive experiments on THUMOS14, ActivityNet1.2, and ActivityNet1.3 datasets demonstrate the effectiveness of the SRHN method.
更多
查看译文
关键词
Weakly-supervised Temporal Action Localization,Snippets Relation Module,Hard-snippets Mask Module,Snippet Enhancement Loss
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要