Multi-person pose estimation using atrous convolution

Electronics Letters(2019)

引用 2|浏览7
暂无评分
摘要
The technology of human keypoint localisation has been greatly improved with the development of deep neural network. In particular, recent methods that exploit multi-scale features and cascaded networks have achieved the accurate prediction of multi-person keypoints. These methods typically extract small-resolution feature maps with classical backbone and then generate heatmaps through upsampling. However, consecutive striding is harmful for keypoint localisation since detail information is decimated. In this Letter, the authors present a novel network structure that uses atrous spatial pyramid pooling to generate keypoint prediction. First, atrous convolution is used in the backbone to expand the receptive field and maintain the scale of the feature map. Thus, the size of the feature map can be guaranteed to avoid too many details being removed. Second, multi-scale features are extracted using an atrous spatial pyramid pooling module to enrich the scale information of the obtained features. Finally, instead of upsampling, deconvolutional layers are applied to construct the output heatmaps. State-of-the-art results are achieved on the MS COCO 2017 keypoint database.
更多
查看译文
关键词
object recognition,neural nets,feature extraction,object detection,image resolution,image representation,pose estimation,computer vision
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要