Effective Video Summarization by Extracting Parameter-Free Motion Attention.

Tingting Han , Quan Zhou,Jun Yu ,Zhou Yu , Jianhui Zhang,Sicheng Zhao

ACM Trans. Multim. Comput. Commun. Appl.(2024)

Cited 0|Views14
No score
Abstract
Video summarization remains a challenging task despite increasing research efforts. Traditional methods focus solely on long-range temporal modeling of video frames, overlooking important local motion information that cannot be captured by frame-level video representations. In this article, we propose the Parameter-free Motion Attention Module (PMAM) to exploit the crucial motion clues potentially contained in adjacent video frames, using a multi-head attention architecture. The PMAM requires no additional training for model parameters, leading to an efficient and effective understanding of video dynamics. Moreover, we introduce the Multi-feature Motion Attention Network (MMAN), integrating the PMAM with local and global multi-head attention based on object-centric and scene-centric video representations. The synergistic combination of local motion information, extracted by the proposed PMAM, with long-range interactions modeled by the local and global multi-head attention mechanism, can significantly enhance the performance of video summarization. Extensive experimental results on the benchmark datasets, SumMe and TVSum, demonstrate that the proposed MMAN outperforms other state-of-the-art methods, resulting in remarkable performance gains.
More
Translated text
Key words
Video summarization,parameter-free,motion attention,feature fusion,multi-head attention
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined