MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing

March 31, 2025 · Declared Dead · 🏛 arXiv.org

Authors Karim Radouane, Hanane Azzag, Mustapha lebbah arXiv ID 2503.24219 Category cs.CV: Computer Vision Cross-listed cs.AI, cs.CL, cs.LG, cs.MM Citations 0 Venue arXiv.org Repository https://github.com/rd20karim/MB-ORES} Last Checked 2 months ago

Abstract

We propose a unified framework that integrates object detection (OD) and visual grounding (VG) for remote sensing (RS) imagery. To support conventional OD and establish an intuitive prior for VG task, we fine-tune an open-set object detector using referring expression data, framing it as a partially supervised OD task. In the first stage, we construct a graph representation of each image, comprising object queries, class embeddings, and proposal locations. Then, our task-aware architecture processes this graph to perform the VG task. The model consists of: (i) a multi-branch network that integrates spatial, visual, and categorical features to generate task-aware proposals, and (ii) an object reasoning network that assigns probabilities across proposals, followed by a soft selection mechanism for final referring object localization. Our model demonstrates superior performance on the OPT-RSVG and DIOR-RSVG datasets, achieving significant improvements over state-of-the-art methods while retaining classical OD capabilities. The code will be available in our repository: \url{https://github.com/rd20karim/MB-ORES}.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 💻 Repository 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Computer Vision

🌅 🌅 Old Age

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, ... (+2 more)

cs.CV 🏛 CVPR 📚 220.4K cites 10 years ago

🌅 🌅 Old Age

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, Kaiming He, ... (+2 more)

cs.CV 🏛 IEEE TPAMI 📚 70.4K cites 10 years ago

R.I.P. 👻 Ghosted

You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon, Santosh Divvala, ... (+2 more)

cs.CV 🏛 CVPR 📚 43.4K cites 10 years ago

🌅 🌅 Old Age

SSD: Single Shot MultiBox Detector

Wei Liu, Dragomir Anguelov, ... (+5 more)

cs.CV 🏛 ECCV 📚 33.8K cites 10 years ago

🌅 🌅 Old Age

Squeeze-and-Excitation Networks

Jie Hu, Li Shen, ... (+3 more)

cs.CV 🏛 CVPR 📚 32.3K cites 8 years ago

R.I.P. 👻 Ghosted

Rethinking the Inception Architecture for Computer Vision

Christian Szegedy, Vincent Vanhoucke, ... (+3 more)

cs.CV 🏛 CVPR 📚 30.2K cites 10 years ago

Died the same way — 💀 404 Not Found

R.I.P. 💀 404 Not Found

Deep High-Resolution Representation Learning for Visual Recognition

Jingdong Wang, Ke Sun, ... (+10 more)

cs.CV 🏛 IEEE TPAMI 📚 4.4K cites 6 years ago

R.I.P. 💀 404 Not Found

HuggingFace's Transformers: State-of-the-art Natural Language Processing

Thomas Wolf, Lysandre Debut, ... (+20 more)

cs.CL 🏛 arXiv 📚 3.5K cites 6 years ago

R.I.P. 💀 404 Not Found

CCNet: Criss-Cross Attention for Semantic Segmentation

Zilong Huang, Xinggang Wang, ... (+5 more)

cs.CV 🏛 ICCV 📚 2.9K cites 7 years ago

R.I.P. 💀 404 Not Found

Unified Perceptual Parsing for Scene Understanding

Tete Xiao, Yingcheng Liu, ... (+3 more)

cs.CV 🏛 ECCV 📚 2.3K cites 7 years ago