Pose Relation Transformer Refine Occlusions for Human Pose Estimation

2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA(2023)

引用 0|浏览14
暂无评分
摘要
Accurately estimating the human pose is an essential task for many applications in robotics. However, existing pose estimation methods suffer from poor performance when occlusion occurs. Recent advances in NLP have been very successful in predicting the missing words conditioned on visible words. We draw upon the sentence completion analogy in NLP to guide our model to address occlusions in the pose estimation problem. We propose a novel approach that can mitigate the effect of occlusions motivated by the sentence completion task of NLP. In an analogous manner, we designed our model to reconstruct occluded joints given the visible joints utilizing joint correlations by capturing the implicit joint connectivity through the attention mechanism. In this work, we propose a POse Relation Transformer (PORT) that captures the global context of the pose using self-attention and a local context by aggregating adjacent joint features. To supervise PORT in learning joint correlations, we guide PORT to reconstruct randomly masked joints, which we call Masked Joint Modeling (MJM). PORT trained with MJM adds to existing keypoint detection methods and successfully refines occlusions. Notably, PORT is a model-agnostic plug-and-play module for pose refinement under occlusion that can be plugged into any keypoint detector with substantially low computational costs. We conducted extensive experiments to demonstrate the advantage of PORT mitigating the occlusion on the hand and body pose PORT improves the pose estimation accuracy of existing human pose estimation methods by up to 16% with only 5% of additional parameters. The code is publicly available at https://github.com/stnoah1/PORT.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要