DTCM: Joint Optimization of Dark Enhancement and Action Recognition in Videos.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society(2023)

引用 12|浏览5
暂无评分
摘要
Recognizing human actions in dark videos is a useful yet challenging visual task in reality. Existing augmentation-based methods separate action recognition and dark enhancement in a two-stage pipeline, which leads to inconsistently learning of temporal representation for action recognition. To address this issue, we propose a novel end-to-end framework termed Dark Temporal Consistency Model (DTCM), which is able to jointly optimize dark enhancement and action recognition, and force the temporal consistency to guide downstream dark feature learning. Specifically, DTCM cascades the action classification head with the dark augmentation network to perform dark video action recognition in a one-stage pipeline. Our explored spatio-temporal consistency loss, which utilizes the RGB-Difference of dark video frames to encourage temporal coherence of the enhanced video frames, is effective for boosting spatio-temporal representation learning. Extensive experiments demonstrated that our DTCM has remarkable performance: 1) Competitive accuracy, which outperforms the state-of-the-arts on the ARID dataset by 2.32% and the UAVHuman-Fisheye dataset by 4.19% in accuracy, respectively; 2) High efficiency, which surpasses the current most advanced method [1] with only 6.4% GFLOPs and 71.3% number of parameters; 3) Strong generalization, which can be used in various action recognition methods (e.g., TSM, I3D, 3D-ResNext-101, Video-Swin) to promote their performance significantly.
更多
查看译文
关键词
dark enhancement,action recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要