Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving

October 03, 2023 · Declared Dead · 🏛 IEEE International Conference on Robotics and Automation

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Long Chen, Oleg Sinavski, Jan Hünermann, Alice Karnsund, Andrew James Willmott, Danny Birch, Daniel Maund, Jamie Shotton arXiv ID 2310.01957 Category cs.RO: Robotics Cross-listed cs.AI, cs.CL, cs.CV Citations 308 Venue IEEE International Conference on Robotics and Automation Last Checked 3 months ago

Abstract

Large Language Models (LLMs) have shown promise in the autonomous driving sector, particularly in generalization and interpretability. We introduce a unique object-level multimodal LLM architecture that merges vectorized numeric modalities with a pre-trained LLM to improve context understanding in driving situations. We also present a new dataset of 160k QA pairs derived from 10k driving scenarios, paired with high quality control commands collected with RL agent and question answer pairs generated by teacher LLM (GPT-3.5). A distinct pretraining strategy is devised to align numeric vector modalities with static LLM representations using vector captioning language data. We also introduce an evaluation metric for Driving QA and demonstrate our LLM-driver's proficiency in interpreting driving scenarios, answering questions, and decision-making. Our findings highlight the potential of LLM-based driving action generation in comparison to traditional behavioral cloning. We make our benchmark, datasets, and model available for further exploration.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Robotics

🌅 🌅 Old Age

ORB-SLAM: a Versatile and Accurate Monocular SLAM System

Raul Mur-Artal, J. M. M. Montiel, Juan D. Tardos

cs.RO 🏛 IEEE TRO 📚 7.0K cites 11 years ago

R.I.P. 👻 Ghosted

ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras

Raul Mur-Artal, Juan D. Tardos

cs.RO 🏛 IEEE TRO 📚 6.1K cites 9 years ago

R.I.P. 👻 Ghosted

VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator

Tong Qin, Peiliang Li, Shaojie Shen

cs.RO 🏛 IEEE TRO 📚 4.0K cites 8 years ago

R.I.P. 👻 Ghosted

ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM

Carlos Campos, Richard Elvira, ... (+3 more)

cs.RO 🏛 IEEE TRO 📚 3.8K cites 5 years ago

R.I.P. 👻 Ghosted

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

Josh Tobin, Rachel Fong, ... (+4 more)

cs.RO 🏛 IROS 📚 3.5K cites 9 years ago

R.I.P. 👻 Ghosted

Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age

Cesar Cadena, Luca Carlone, ... (+6 more)

cs.RO 🏛 IEEE TRO 📚 3.2K cites 9 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, ... (+29 more)

cs.CL 🏛 NeurIPS 📚 54.2K cites 6 years ago

R.I.P. 👻 Ghosted

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, Sam Gross, ... (+19 more)

cs.LG 🏛 NeurIPS 📚 49.7K cites 6 years ago

R.I.P. 👻 Ghosted

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, Carlos Guestrin

cs.LG 🏛 KDD 📚 49.2K cites 10 years ago

R.I.P. 👻 Ghosted

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, Christian Szegedy

cs.LG 🏛 ICML 📚 46.0K cites 11 years ago