PanSR: An Object-Centric Mask Transformer for Panoptic Segmentation
December 13, 2024 Β· Declared Dead Β· π arXiv.org
Repo contents: README.md
Authors
Lojze Ε½ust, Matej Kristan
arXiv ID
2412.10589
Category
cs.CV: Computer Vision
Citations
1
Venue
arXiv.org
Repository
https://github.com/lojzezust/PanSR
β 8
Last Checked
1 month ago
Abstract
Panoptic segmentation is a fundamental task in computer vision and a crucial component for perception in autonomous vehicles. Recent mask-transformer-based methods achieve impressive performance on standard benchmarks but face significant challenges with small objects, crowded scenes and scenes exhibiting a wide range of object scales. We identify several fundamental shortcomings of the current approaches: (i) the query proposal generation process is biased towards larger objects, resulting in missed smaller objects, (ii) initially well-localized queries may drift to other objects, resulting in missed detections, (iii) spatially well-separated instances may be merged into a single mask causing inconsistent and false scene interpretations. To address these issues, we rethink the individual components of the network and its supervision, and propose a novel method for panoptic segmentation PanSR. PanSR effectively mitigates instance merging, enhances small-object detection and increases performance in crowded scenes, delivering a notable +3.4 PQ improvement over state-of-the-art on the challenging LaRS benchmark, while reaching state-of-the-art performance on Cityscapes. The code and models will be publicly available at https://github.com/lojzezust/PanSR.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Computer Vision
π
π
Old Age
π
π
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
π»
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
π
π
Old Age
SSD: Single Shot MultiBox Detector
π
π
Old Age
Squeeze-and-Excitation Networks
R.I.P.
π»
Ghosted
Rethinking the Inception Architecture for Computer Vision
Died the same way β π Death by README
R.I.P.
π
Death by README
Momentum Contrast for Unsupervised Visual Representation Learning
R.I.P.
π
Death by README
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
R.I.P.
π
Death by README
Revisiting Graph based Collaborative Filtering: A Linear Residual Graph Convolutional Network Approach
R.I.P.
π
Death by README