Absolute Monocular Depth Estimation on Robotic Visual and Kinematics Data via Self-Supervised Learning

IEEE Transactions on Automation Science and Engineering(2024)

引用 0|浏览3
暂无评分
摘要
Accurate estimation of absolute depth from a monocular endoscope is a fundamental task for automatic navigation systems in robotic surgery. Previous works solely rely on uni-modal data ( i.e. , monocular images), which can only estimate depth values arbitrarily scaled with the real world. In this paper, we present a novel framework, SADER, which explores vision and robot kinematics to estimate the high-quality absolute depth for monocular surgical scenes. To jointly learn the multi-modal data, we introduce a self-distillation based two-stage training policy in the framework. In the first stage, a boosting depth module based on vision transformer is proposed to improve the relative depth estimation network that is trained in a self-supervised method. Then, we develop an algorithm to automatically compute the scale from robot kinematics. By coupling the scale and relative depth data, pseudo absolute depth labels for all images are yielded. In the second stage, we re-train the network with 3D loss supervised by pseudo labels. To make our method generalize to different endoscopes, the learning of endoscopic intrinsics is integrated into the network. In addition, we did cadaver experiments to collect new surgical depth estimation data about robotic laparoscopy for evaluation. Experimental results on public SCARED and cadaver data demonstrate that the SADER outperforms previous state-of-art even stereo-based methods with an accuracy error under 1.90 mm, proving the feasibility of our approach to recover the absolute depth with monocular inputs. Note to Practitioners —This paper aims to solve the problem of absolute monocular depth estimation in automatic surgical navigation by leveraging the multi-modal data from the robot-based endoscopic system. Accurate depth perception with real scales of the monocular scene is essential for the control of surgical robots in automatic navigation. However, current methods can only predict the relative depth of the surgical scene using monocular images. In this article, we propose a self-supervised learning-based method to achieve high-quality absolute depth estimation of monocular endoscopic images. It neither needs manual data annotation, nor other imaging modalities. The experiments extensively validate the feasibility and high performance of our framework for absolute depth estimation on monocular endoscopes. This absolute depth perception framework can be potentially encapsulated into the automatic navigation system in the near future.
更多
查看译文
关键词
Surgical robotics,endoscope,absolute depth estimation,monocular images,multi-modal learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要