HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks

August 24, 2023 · Declared Dead · 🏛 arXiv.org

Repo contents: README.md

Authors Zichao Dong, Weikun Zhang, Xufeng Huang, Hang Ji, Xin Zhan, Junbo Chen arXiv ID 2308.12537 Category cs.RO: Robotics Cross-listed cs.CV Citations 7 Venue arXiv.org Repository https://github.com/dzcgaara/HuBo-VLM ⭐ 7 Last Checked 1 month ago

Abstract

Human robot interaction is an exciting task, which aimed to guide robots following instructions from human. Since huge gap lies between human natural language and machine codes, end to end human robot interaction models is fair challenging. Further, visual information receiving from sensors of robot is also a hard language for robot to perceive. In this work, HuBo-VLM is proposed to tackle perception tasks associated with human robot interaction including object detection and visual grounding by a unified transformer based vision language model. Extensive experiments on the Talk2Car benchmark demonstrate the effectiveness of our approach. Code would be publicly available in https://github.com/dzcgaara/HuBo-VLM.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 💻 Repository 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Robotics

🌅 🌅 Old Age

ORB-SLAM: a Versatile and Accurate Monocular SLAM System

Raul Mur-Artal, J. M. M. Montiel, Juan D. Tardos

cs.RO 🏛 IEEE TRO 📚 7.0K cites 11 years ago

R.I.P. 👻 Ghosted

ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras

Raul Mur-Artal, Juan D. Tardos

cs.RO 🏛 IEEE TRO 📚 6.1K cites 9 years ago

R.I.P. 👻 Ghosted

VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator

Tong Qin, Peiliang Li, Shaojie Shen

cs.RO 🏛 IEEE TRO 📚 4.0K cites 8 years ago

R.I.P. 👻 Ghosted

ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM

Carlos Campos, Richard Elvira, ... (+3 more)

cs.RO 🏛 IEEE TRO 📚 3.8K cites 5 years ago

R.I.P. 👻 Ghosted

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

Josh Tobin, Rachel Fong, ... (+4 more)

cs.RO 🏛 IROS 📚 3.5K cites 9 years ago

R.I.P. 👻 Ghosted

Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age

Cesar Cadena, Luca Carlone, ... (+6 more)

cs.RO 🏛 IEEE TRO 📚 3.2K cites 9 years ago

Died the same way — 📜 Death by README

R.I.P. 📜 Death by README

Momentum Contrast for Unsupervised Visual Representation Learning

Kaiming He, Haoqi Fan, ... (+3 more)

cs.CV 🏛 CVPR 📚 14.3K cites 6 years ago

R.I.P. 📜 Death by README

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

Peng Gao, Jiaming Han, ... (+10 more)

cs.CV 🏛 arXiv 📚 716 cites 2 years ago

R.I.P. 📜 Death by README

Revisiting Graph based Collaborative Filtering: A Linear Residual Graph Convolutional Network Approach

Lei Chen, Le Wu, ... (+3 more)

cs.IR 🏛 AAAI 📚 609 cites 6 years ago

R.I.P. 📜 Death by README

Diffusion Models for Medical Image Analysis: A Comprehensive Survey

Amirhossein Kazerouni, Ehsan Khodapanah Aghdam, ... (+5 more)

eess.IV 🏛 MedIA 📚 599 cites 3 years ago