Mixed Resolution Network with hierarchical motion modeling for efficient action recognition

Knowledge-Based Systems(2024)

Cited 0|Views10
No score
Abstract
The dual-stream architecture is frequently employed for learning diverse features from videos. This paper introduces a novel Mixed Resolution Network (MixRes) for processing inputs with hybrid spatiotemporal resolutions, namely high-spatial and low-temporal resolution input, as well as low-spatial and high-temporal resolution input. The utilization of mixed spatiotemporal resolutions not only facilitates the independent emphasis of the two streams on appearance and motion encoding but also diminishes the computational burden. Furthermore, by leveraging the characteristics of neural networks with multiple layers, the temporal stream in the proposed network is divided into different steps to capture short-term and long-term motion information. Finally, we design a Temporal Multiscale Motion Excitation (TMME) module, which enhances the motion-related channels of the video representation by employing multiscale temporal differences. We conduct extensive experiments on multiple action recognition benchmarks, including Something-Something V1 & V2 and Kinetics-400. The outcomes validate that the proposed method achieves superior action recognition performance with low computational cost as compared to the state-of-the-art methods.
More
Translated text
Key words
Action recognition,Two-stream network,Mixed spatiotemporal resolution input,Hierarchical motion modeling
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined