An end-to-end framework for real-time violent behavior detection based on 2D CNNs

Journal of Real-Time Image Processing(2024)

引用 0|浏览3
暂无评分
摘要
Violent behavior detection (VioBD), as a special action recognition task, aims to detect violent behaviors in videos, such as mutual fighting and assault. Some progress has been made in the research of violence detection, but the existing methods have poor real-time performance and the algorithm performance is limited by the interference of complex backgrounds and the occlusion of dense crowds. To solve the above problems, we propose an end-to-end real-time violence detection framework based on 2D CNNs. First, we propose a lightweight skeletal image (SI) as the input modality, which can obtain the human body posture information and richer contextual information, and at the same time remove the background interference. As tested, at the same accuracy, the resolution of SI modality is only one-third of that of RGB modality, which greatly improves the real-time performance of model training and inference, and at the same resolution, SI modality has higher inaccuracy. Second, we also design a parallel prediction module (PPM), which can simultaneously obtain the single image detection results and the inter-frame motion information of the video, which can improve the real-time performance of the algorithm compared with the traditional “detect the image first, understand the video later" mode. In addition, we propose an auxiliary parameter generation module (APGM) with both efficiency and accuracy, APGM is a 2D CNNs-based video understanding module for weighting the spatial information of the video features, processing speed can reach 30–40 frames per second, and compared with models such as CNN-LSTM (Iqrar et al., Aamir: Cnn-lstm based smart real-time video surveillance system. In: 2022 14th International Conference on Mathematics, Actuarial, Science, Computer Science and Statistics (MACS), pages 1–5. IEEE, 2022) and Ludl et al. (Cristóbal: Simple yet efficient real-time pose-based action recognition. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pages 581–588. IEEE, 1999), the propagation effect speed can be increased by an average of 3 ∼ 20 frames per second per group of clips, which further improves the video motion detection efficiency and accuracy, greatly improving real-time performance. We conducted experiments on some challenging benchmarks, and RVBDN can maintain excellent speed and accuracy in long-term interactions, and are able to meet real-time requirements in methods for violence detection and spatio-temporal action detection. Finally, we update our proposed new dataset on violence detection images (violence image dataset). Dataset is available at https://github.com/ChinaZhangPeng/Violence-Image-Dataset
更多
查看译文
关键词
Violent behavior detection,Real-time action recognition,Modified YOLO-Pose,Object detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要