TransCNNLoc: End-to-end pixel-level learning for 2D-to-3D pose estimation in dynamic indoor scenes

ISPRS Journal of Photogrammetry and Remote Sensing(2024)

引用 0|浏览0
暂无评分
摘要
Accurate localization in GPS-denied environments has always been a core issue in computer vision and robotics research. In indoor environments, vision-based localization methods are susceptible to changes in lighting conditions, viewing angles, and environmental factors, resulting in localization failures or limited generalization capabilities. In this paper, we propose the TransCNNLoc framework, which consists of an encoding–decoding network designed to learn more robust image features for camera pose estimation. In the image feature encoding stage, CNN and Swin Transformer are integrated to construct the image feature encoding module, enabling the network to fully extract global context and local features from images. In the decoding stage, multi-level image features are decoded through cross-layer connections while computing per-pixel feature weight maps. To enhance the framework’s robustness to dynamic objects, a dynamic object recognition network is introduced to optimize the feature weights. Finally, a multi-level iterative optimization from coarse to fine levels is performed to recover six degrees of freedom camera pose. Experiments were conducted on the publicly available 7scenes dataset as well as a dataset collected under changing lighting conditions and dynamic scenes for accuracy validation and analysis. The experimental results demonstrate that the proposed TransCNNLoc framework exhibits superior adaptability to dynamic scenes and lighting changes. In the context of static environments within publicly available datasets, the localization technique introduced in this study attains a maximal precision of up to 5 centimeters, consistently achieving superior outcomes across a majority of the scenarios. Under the conditions of dynamic scenes and fluctuating illumination, this approach demonstrates an enhanced precision capability, reaching up to 3 centimeters. This represents a substantial refinement from the decimeter scale to a centimeter scale in precision, marking a significant advancement over the existing state-of-the-art (SOTA) algorithms. The open-source repository for the method proposed in this paper can be found at the following URL: github.com/Geelooo/TransCNNloc.
更多
查看译文
关键词
Indoor localization,Feature learning,Structure from motion,Levenberg–Marquardt,Image retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要