Efficient Human Pose Estimation by Learning Deeply Aggregated Representations

arxiv(2021)

引用 9|浏览92
暂无评分
摘要
In this paper, we propose an efficient human pose estimation network (DANet) by learning deeply aggregated representations. Most existing models explore multi-scale infonnation mainly from features with different spatial sizes. Powerful multi-scale representations usually rely on the cascaded pyramid framework. This framework largely boosts the performance but in the meanwhile makes networks very deep and complex. Instead, we focus on exploiting multi-scale information from layers with different receptive-field sizes and then making full of use this infonnation by improving the fusion method. Specifically, we propose an orthogonal attention block (OAB) and a second-order fusion unit (SFU). The OAB learns multi-scale infonnation from different layers and enhances them by encouraging them to be diverse. The SFU adaptively selects and fuses diverse multi-scale infonnation and suppress the redundant ones. With the help of OAB and SFU, our networks could achieve comparable or even better accuracy with much smaller model complexity. Specifically, our DANet-72 achieves 71.0 in AP score on COCO val2017 with only 1.0G FLOPS. Its speed on a CPU platfonn achieves 58 Persons-Per-Second (PPS).
更多
查看译文
关键词
Efficient human pose estimation,orthogonal attention,second-order fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要