Chrome Extension
WeChat Mini Program
Use on ChatGLM

FGCT6D: Frequency-Guided CNN-Transformer Fusion Network for Metal Parts' Robust 6D Pose Estimation

IEEE ROBOTICS AND AUTOMATION LETTERS(2024)

Cited 0|Views17
No score
Abstract
The 6D pose estimation for metal parts is essential in industrial robotic applications. The color homogeneity, texture-less and light-reflecting properties of metal parts raise great challenges. Current 6D pose estimation methods have gained extensive concern using CNNs. However, these CNN-based methods lack Transformer's ability to focus on extracting low-frequency features and long-range context information. In the letter, we explore taking full advantage of CNN and Transformer from a frequency-domain perspective to enhance the performance of metal parts' 6D pose estimation. Specifically, we propose a frequency-guided CNN-Transformer fusion 6D pose estimation network (FGCT6D). First, we construct a novel pixel attention residual module to improve the high-frequency attention of CNN. Then, we design a dual-branch CNN-Transformer encoder: the Swin-Transformer extracts global information and low-frequency features, and the CNN captures local information and high-frequency features. Second, the frequency-guided feature fusion module is proposed to fuse the extracted multi-spectral features. Third, to maximize the utilization of the rich frequency-domain feature representation, we propose a feature fusion decoder with Conv-MSA modules. Additionally, we leverage optimal transport theory, treating dense correspondences as spatial probability distributions, and design the optimal transport loss function. Experiments show that our method can extract rich frequency-domain features, and achieve competitive performance on the MP6D and LINEMOD datasets.
More
Translated text
Key words
Deep learning for visual perception,visual learning,computer vision for automation
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined