Towards a Unified Network for Robust Monocular Depth Estimation: Network Architecture, Training Strategy and Dataset

Mochu Xiang,Yuchao Dai, Feiyu Zhang, Jiawei Shi,Xinyu Tian, Zhensong Zhang

International Journal of Computer Vision(2024)

引用 0|浏览3
暂无评分
摘要
Robust monocular depth estimation (MDE) aims at learning a unified model that works across diverse real-world scenes, which is an important and active topic in computer vision. In this paper, we present Megatron_RVC, our winning solution for the monocular depth challenge in the Robust Vision Challenge (RVC) 2022, where we tackle the challenging problem from three perspectives: network architecture, training strategy and dataset. In particular, we made three contributions towards robust MDE: (1) we built a neural network with high capacity to enable flexible and accurate monocular depth predictions, which contains dedicated components to provide content-aware embeddings and to improve the richness of the details; (2) we proposed a novel mixing training strategy to handle real-world images with different aspect ratios, resolutions and apply tailored loss functions based on the properties of their depth maps; (3) to train a unified network model that covers diverse real-world scenes, we used over 1 million images from different datasets. As of 3rd October 2022, our unified model ranked consistently first across three benchmarks (KITTI, MPI Sintel, and VIPER) among all participants.
更多
查看译文
关键词
Monocular depth estimation,Robust,Unified network,Multi-dataset training
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要