Rethinking the Person Localization for Single-Stage Multi-Person Pose Estimation

IEEE TRANSACTIONS ON MULTIMEDIA(2024)

引用 1|浏览26
暂无评分
摘要
Single-stage models for multi-person pose estimation have garnered significant attention due to their streamlined approach in generating person position localization and body structure perception in a single pass. These two parts, however, are processed individually by existing methods, leading to suboptimal results, e.g., candidates with high confidences for person localization while poor structure estimations. To this end, we propose a simple yet effective approach, namely Structure-guided Person Localization (SPL), jointly leveraging the advantages of the two aspects to solve the multi-person pose estimation problem, with two complementary novelties. First, we propose to incorporate body structure perception to guide person position localization, consequently, we introduce the Structure-guided Center Learning (SCL) to unify the quality of the body structure perception in the displacement map with the confidence of the person existence in the center map, thus achieving more accurate keypoint position localization results even with extreme poses. Second, to facilitate the end-to-end training of SPL, we propose the efficient Agency-based Scale-adaptive Learning (ASL). Specifically, we predict an agency map of the same size as the center map, which focuses on the foreground area and can adaptively adjust the scale size for each central area with the body structure perception confidence. Comprehensive experiments on challenging benchmarks including COCO and CrowdPose clearly verify the superiority of our framework, which achieves new state-of-the-art single-stage multi-person pose estimation results. Specifically, SPL obtains 72.1 AP scores and 69.5 AP scores in COCO test-dev2017 and CrowdPose test set, respectively.
更多
查看译文
关键词
Computer vision,multi-person pose estimation,single-stage
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要