Industrial object detection with multi-modal SSD: closing the gap between synthetic and real images

Multimedia Tools and Applications(2024)

引用 1|浏览0
暂无评分
摘要
Object detection for industrial applications faces challenges that are yet to solve by state-of-the-art deep learning models. They usually lack training data, and the common solution of using a synthetic dataset introduces a domain gap when the model is provided real images. Besides, few architectures fit in the small memory of a mobile device and run in real-time with limited computation capabilities. The models fulfilling these requirements generally have low learning capacity, and the domain gap reduces further the performance. In this work, we propose multiple strategies to reduce the domain gap when using RGB-D images, and to increase the overall performance of a Convolutional Neural Network (CNN) for object detection with a reasonable increase of the model size. First, we propose a new architecture based on the Single Shot Detector (SSD) architecture, and we compare different fusion methods to increase the performance with few or no additional parameters. We applied the proposed method to three synthetic datasets with different visual characteristics, and we show that classical image processing reduces significantly the domain gap for depth maps. Our experiments have shown an improvement when fusing RGB and depth images for two benchmark datasets, even when the depth maps contain few discriminative information. Our RGB-D SSD Lite model performs on par or better than a ResNet-FPN RetinaNet model on the LINEMOD and T-LESS datasets, while requiring 20 times less computation. Finally, we provide some insights on training a robust model for improved performance when one of the modalities is missing.
更多
查看译文
关键词
Object detection,Deep learning,Synthetic dataset,Industrial,RGB-D
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要