Hybrid CNN-ViT architecture to exploit spatio-temporal feature for fire recognition trained through transfer learning

Multimedia Tools and Applications(2024)

引用 0|浏览4
暂无评分
摘要
Fires are becoming one of the major natural hazards that threaten the ecology, economy, human life and even more worldwide. Therefore, early fire detection systems are crucial to prevent fires from spreading out of control and causing destruction. Based on vision sensors, many fire detection techniques have evolved with the recent surge of curiosity in deep learning, which exploits the spatial features of individual images. However, fire can take different forms, scales, and combustion materials can produce different colors, making accurate fire detection from an image challenging. Small fires captured from long-distance cameras lack salient features, further complicating detection. This paper proposes a hybrid structure that uses attention-enhanced convolutional neural networks and vision transformers (CNN-ViT) to detect fire. The proposed CNN-ViT first pays spatial attention to every frame and then aggregates temporal contextual information from neighboring frames to improve detection performance. Due to the limited availability of training fire datasets, the study employs deep transfer learning for feature extraction using pre-trained CNN. We used various metrics to examine the efficacy of the proposed approach. The results showed that the CNN-ViT method outperformed previous models based on spatial-temporal characteristics by achieving a relative improvement in accuracy and F1 score. The satisfactory results on images contaminated with different intensities of noise confirm the robustness of the approach.
更多
查看译文
关键词
Fire recognition,Vision transformers,Deep learning,Disaster management,Small-sized fire
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要