Transformer-based rapid human pose estimation network

Computers & Graphics(2023)

Cited 0|Views4
No score
Abstract
Most current human pose estimation methods pursue excellent performance via large models and intensive computational requirements, resulting in slower models. These methods cannot be effectively adopted for human pose estimation in real applications due to their high memory and computational costs. To achieve a trade-off between accuracy and efficiency, we propose TRPose, a Transformer-based network for human pose estimation rapidly. TRPose consists of an early convolutional stage and a later Transformer stage seamlessly. Concretely, the convolutional stage forms a Rapid Fusion Module (RFM), which efficiently acquires multi-scale features via three parallel convolution branches. The Transformer stage utilizes multi-resolution Transformers to construct a Dual scale Encoder Module (DEM), aiming at learning long-range dependencies from different scale features of the whole human skeletal keypoints. The experiments show that TRPose acquires 74.3 AP and 73.8 AP on COCO validation and testdev datasets with 170+ FPS on a GTX2080Ti, which achieves the better efficiency and effectiveness trade-offs than most state-of-the-art methods. Our model also outperforms mainstream Transformer-based architectures on MPII dataset, yielding 89.9 PCK@0.5 score on val set without extra data.
More
Translated text
Key words
Transformer architecture,Human pose estimation,Inference speed,Computational cost
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined