Modulation-Based Center Alignment and Motion Mining for Spatial Temporal Action Detection

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览1
暂无评分
摘要
The goal of spatial-temporal action detection is to generate spatial-temporally aligned action tubes. Most of the existing 2D CNN-based solutions directly aggregate temporal adjacent contexts through frames without alignment. The misaligned spatial-temporal contextual features might lead to chaotic representation and misaligned action tubes. Moreover, most existing methods fail to efficiently exploit motion dependencies. In this paper, we propose Modulation-based Center Alignment (MCA) and Sparse Valuable Motion Mining (SVMM) for more accurate action detection: With deformable convolution, key-frame based modulation is firstly designed to align the action center between temporal frames; then motion region guided sparse self-attention is developed for valuable motion mining. Our framework can outperform current 2D CNN-based methods significantly, based on the experimental result on two widely used benchmarks of JH-MDB and UCF101-24.
更多
查看译文
关键词
Action detection,Action Center Alignment,Sparse self-attention,Motion Mining
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要