LIDAR-Camera Fusion for Road Detection Using Fully Convolutional Neural Networks

September 21, 2018 · Declared Dead · 🏛 Robotics Auton. Syst.

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Luca Caltagirone, Mauro Bellone, Lennart Svensson, Mattias Wahde arXiv ID 1809.07941 Category cs.CV: Computer Vision Citations 334 Venue Robotics Auton. Syst. Last Checked 3 months ago

Abstract

In this work, a deep learning approach has been developed to carry out road detection by fusing LIDAR point clouds and camera images. An unstructured and sparse point cloud is first projected onto the camera image plane and then upsampled to obtain a set of dense 2D images encoding spatial information. Several fully convolutional neural networks (FCNs) are then trained to carry out road detection, either by using data from a single sensor, or by using three fusion strategies: early, late, and the newly proposed cross fusion. Whereas in the former two fusion approaches, the integration of multimodal information is carried out at a predefined depth level, the cross fusion FCN is designed to directly learn from data where to integrate information; this is accomplished by using trainable cross connections between the LIDAR and the camera processing branches. To further highlight the benefits of using a multimodal system for road detection, a data set consisting of visually challenging scenes was extracted from driving sequences of the KITTI raw data set. It was then demonstrated that, as expected, a purely camera-based FCN severely underperforms on this data set. A multimodal system, on the other hand, is still able to provide high accuracy. Finally, the proposed cross fusion FCN was evaluated on the KITTI road benchmark where it achieved excellent performance, with a MaxF score of 96.03%, ranking it among the top-performing approaches.