SRNet: Scale-Aware Representation Learning Network for Dense Crowd Counting

Liangjun Huang, Luning Zhu, Shihui Shen, Qing Zhang, Jianwei Zhang

IEEE ACCESS（2021）

Cited 5|Views13

No score

Abstract

Huge variations in the scales of people in images create an extremely challenging problem in the task of crowd counting. Currently, many researchers apply multi-column structures to solve the scale variation problem. However, multi-column structures usually have complex structures with large numbers of parameters and are difficult to optimize. To this end, we propose a scale-aware representation learning network (SRNet) that uses a commonly used encoder-decoder framework. An image is converted into deep features by the first ten layers of VGG16 in the encoder. Then, the features are regressed to a crowd density map via the decoder. The decoder mainly consists of two modules: the scale-aware feature learning module (SAM) and the pixel-aware upsampling module (PAM). SAM models the multi-scale features of a crowd at each level with different sizes of receptive fields, and PAM enlarges the spatial resolution and enhances the pixel-level semantic information, thereby improving the overall counting accuracy. We conduct extensive crowd counting experiments on ShanghaiTech Part_A, UCF-QNRF, and UCF_CC_50 datasets. Furthermore, to obtain the locations of each person, we conduct crowd localization experiments on UCF-QNRF and NWPU-Crowd datasets. The qualitative and quantitative results prove the effectiveness of the SRNet in dense crowd counting and crowd localization tasks.

Translated text

Key words

Feature extraction,Task analysis,Convolution,Estimation,Decoding,Semantics,Location awareness,Dense crowd counting,multi-scale feature learning,deep learning,convolution neural network

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined