DiffuSAM: Diffusion Guided Zero-Shot Object Grounding for Remote Sensing Imagery

April 20, 2026 ยท Grace Period ยท ๐Ÿ› ICLR 2026 ML4RS Workshop

โณ Grace Period
This paper is less than 90 days old. We give authors time to release their code before passing judgment.
Authors Geet Sethi, Panav Shah, Ashutosh Gandhe, Soumitra Darshan Nayak arXiv ID 2604.18201 Category cs.CV: Computer Vision Cross-listed cs.LG Citations 0 Venue ICLR 2026 ML4RS Workshop
Abstract
Diffusion models have emerged as powerful tools for a wide range of vision tasks, including text-guided image generation and editing. In this work, we explore their potential for object grounding in remote sensing imagery. We propose a hybrid pipeline that integrates diffusion-based localization cues with state-of-the-art segmentation models such as RemoteSAM and SAM3 to obtain more accurate bounding boxes. By leveraging the complementary strengths of generative diffusion models and foundational segmentation models, our approach enables robust and adaptive object localization across complex scenes. Experiments demonstrate that our pipeline significantly improves localization performance, achieving over a 14% increase in Acc@0.5 compared to existing state-of-the-art methods.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computer Vision

๐ŸŒ… ๐ŸŒ… Old Age

Fast R-CNN

Ross Girshick

cs.CV ๐Ÿ› ICCV ๐Ÿ“š 27.7K cites 11 years ago