Radar Meets Vision: Robustifying Monocular Metric Depth Prediction for Mobile Robotics

October 01, 2024 · Declared Dead · 🏛 arXiv.org

Authors Marco Job, Thomas Stastny, Tim Kazik, Roland Siegwart, Michael Pantic arXiv ID 2410.00736 Category cs.RO: Robotics Citations 1 Venue arXiv.org Repository https://github.com/ethz-asl/radarmeetsvision Last Checked 2 months ago

Abstract

Mobile robots require accurate and robust depth measurements to understand and interact with the environment. While existing sensing modalities address this problem to some extent, recent research on monocular depth estimation has leveraged the information richness, yet low cost and simplicity of monocular cameras. These works have shown significant generalization capabilities, mainly in automotive and indoor settings. However, robots often operate in environments with limited scale cues, self-similar appearances, and low texture. In this work, we encode measurements from a low-cost mmWave radar into the input space of a state-of-the-art monocular depth estimation model. Despite the radar's extreme point cloud sparsity, our method demonstrates generalization and robustness across industrial and outdoor experiments. Our approach reduces the absolute relative error of depth predictions by 9-64% across a range of unseen, real-world validation datasets. Importantly, we maintain consistency of all performance metrics across all experiments and scene depths where current vision-only approaches fail. We further address the present deficit of training data in mobile robotics environments by introducing a novel methodology for synthesizing rendered, realistic learning datasets based on photogrammetric data that simulate the radar sensor observations for training. Our code, datasets, and pre-trained networks are made available at https://github.com/ethz-asl/radarmeetsvision.