Efficient Human Pose Estimation by Learning Deeply Aggregated Representations

ICME(2021)

Cited 9|Views94
No score
Abstract
In this paper, we propose an efficient human pose estimation network (DANet) by learning deeply aggregated representations. Most existing models explore multi-scale infonnation mainly from features with different spatial sizes. Powerful multi-scale representations usually rely on the cascaded pyramid framework. This framework largely boosts the performance but in the meanwhile makes networks very deep and complex. Instead, we focus on exploiting multi-scale information from layers with different receptive-field sizes and then making full of use this infonnation by improving the fusion method. Specifically, we propose an orthogonal attention block (OAB) and a second-order fusion unit (SFU). The OAB learns multi-scale infonnation from different layers and enhances them by encouraging them to be diverse. The SFU adaptively selects and fuses diverse multi-scale infonnation and suppress the redundant ones. With the help of OAB and SFU, our networks could achieve comparable or even better accuracy with much smaller model complexity. Specifically, our DANet-72 achieves 71.0 in AP score on COCO val2017 with only 1.0G FLOPS. Its speed on a CPU platfonn achieves 58 Persons-Per-Second (PPS).
More
Translated text
Key words
Efficient human pose estimation,orthogonal attention,second-order fusion
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined