Recurrent Fine-Grained Self-Attention Network for Video Crowd Counting

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览0
暂无评分
摘要
Striking a balance between exploring the spatio-temporal correlation and controlling model complexity is vital for video-based crowd counting methods. In this paper, we propose a Recurrent Fine-Grained Self-Attention Network (RFSNet) to achieve efficient and accurate counting in video scenes via the self-attention mechanism and a recurrent fine-tuning strategy. Specifically, we design a decoder which consists of patch-wise spatial self-attention and temporal self-attention. Compared with vanilla self-attention, it effectively leverages the dependencies in spatial and temporal domain respectively, while significantly reducing computational complexity. Moreover, the RFSNet recurrently feeds the features into the decoder to enhance the spatio-temporal representations. This strategy not only simplifies the model structure and reduces the number of parameters, but also improves the quality of estimated density maps. Our RFSNet achieves state-of-the-art performance on three video crowd counting benchmarks, and outperforms other methods by more than 20% on the challenging FDST dataset.
更多
查看译文
关键词
Crowd counting,temporal modeling,density map regression,self-attention
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要