TransNet: Parallel encoder architecture for human pose estimation

Smart Health(2023)

引用 0|浏览5
暂无评分
摘要
Recently self-attention mechanisms have become increasingly popular for computer vision applications following the success of transformer in natural language processing. Yet, transformer remains under-appreciated compared to the dominant role of convolutional neural networks in the field of computer vision. In this study, we present various approaches for transformers and their application to human pose estimation. We propose a novel model (TransNet) using a convolutional neural network design with a parallel transformer encoder branch to capture the long-range spatial dependency simultaneously while fusing it with the local features extracted from the input images. Experiments results show that TransNet achieves the exceptional performance for human pose estimation on the COCO dataset. Our proposed model outperforms the competitors and achieves the Average Precision (AP) score of 78.3 on COCO val set. Specifically, there is a significant improvement in the average score between the proposed model and the advanced convolutional neural networks. We believe this research can contribute to a better understanding of transformers within computer vision models.
更多
查看译文
关键词
parallel encoder architecture,human
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要