Human Action Recognition Method Based on Motion Excitation and Temporal Aggregation Module

Social Science Research Network(2022)

引用 1|浏览1
暂无评分
摘要
Aiming at the problem of low modeling efficiency and feature loss of temporal modeling in human action recognition, we propose a human action recognition method based on Motion Excitation and Temporal Aggregation module (META). The method can capture multi-state and multi-scale temporal information to achieve effective motion excitation. Firstly, temporal relational sampling is performed on video frames; Secondly, META is proposed to capture multi-state and multi-scale temporal information. META is composed of Multi-scale Motion Excitation module (MME) and Squeeze and Excitation Temporal Aggregation module (SETA). MME captures the feature level temporal difference by transforming the features into the temporal channel, which directly establishes the relationship between features and temporal channel, and solves the problem of low modeling efficiency. SETA transforms the local convolution into a set of sub-convolutions. Multiple sub-convolutions form hierarchies to extract features together and share the results of the upper convolutional layer, which increases the final temporal receptive field and solves the problem of feature loss. Moreover, the optical flow features are extracted through Cross modality pre-training to improve the utilization of temporal information. Finally, the result of human action recognition is carried out by combining spatiotemporal two stream features. Experimental results show that the accuracy of this method in UCF101 and HMDB-51 is 96.0% and 71.2% respectively, which is higher than other studies in the same period.
更多
查看译文
关键词
Cross modality pre-training,Human action recognition,Motion Excitation and Temporal Aggregation module,Spatiotemporal two stream network,Temporal modeling,Temporal relational sampling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要