Epipolar Attention Field Transformers for Bird's Eye View Semantic Segmentation

December 02, 2024 · Declared Dead · 🏛 IEEE Workshop/Winter Conference on Applications of Computer Vision

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Christian Witte, Jens Behley, Cyrill Stachniss, Marvin Raaijmakers arXiv ID 2412.01595 Category cs.CV: Computer Vision Cross-listed cs.RO Citations 1 Venue IEEE Workshop/Winter Conference on Applications of Computer Vision Last Checked 3 months ago

Abstract

Spatial understanding of the semantics of the surroundings is a key capability needed by autonomous cars to enable safe driving decisions. Recently, purely vision-based solutions have gained increasing research interest. In particular, approaches extracting a bird's eye view (BEV) from multiple cameras have demonstrated great performance for spatial understanding. This paper addresses the dependency on learned positional encodings to correlate image and BEV feature map elements for transformer-based methods. We propose leveraging epipolar geometric constraints to model the relationship between cameras and the BEV by Epipolar Attention Fields. They are incorporated into the attention mechanism as a novel attribution term, serving as an alternative to learned positional encodings. Experiments show that our method EAFormer outperforms previous BEV approaches by 2% mIoU for map semantic segmentation and exhibits superior generalization capabilities compared to implicitly learning the camera configuration.

📄 View on arXiv 🌐 View on ar5iv 📑 PDF 🎉 Report Code Found

Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Computer Vision

🌅 🌅 Old Age

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, ... (+2 more)

cs.CV 🏛 CVPR 📚 220.4K cites 10 years ago

🌅 🌅 Old Age

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, Kaiming He, ... (+2 more)

cs.CV 🏛 IEEE TPAMI 📚 70.4K cites 11 years ago

🌅 🌅 Old Age

SSD: Single Shot MultiBox Detector

Wei Liu, Dragomir Anguelov, ... (+5 more)

cs.CV 🏛 ECCV 📚 33.8K cites 10 years ago

🌅 🌅 Old Age

Squeeze-and-Excitation Networks

Jie Hu, Li Shen, ... (+3 more)

cs.CV 🏛 CVPR 📚 32.3K cites 8 years ago

🌅 🌅 Old Age

Fast R-CNN

Ross Girshick

cs.CV 🏛 ICCV 📚 27.7K cites 11 years ago

🌅 🌅 Old Age

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

Ramprasaath R. Selvaraju, Michael Cogswell, ... (+4 more)

cs.CV 🏛 IJCV 📚 24.9K cites 9 years ago

Died the same way — 👻 Ghosted

R.I.P. 👻 Ghosted

Federated Learning: Strategies for Improving Communication Efficiency

Jakub Konečný, H. Brendan McMahan, ... (+4 more)

cs.LG 🏛 arXiv 📚 5.2K cites 9 years ago

R.I.P. 👻 Ghosted

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Cliff Young, ... (+73 more)

cs.AR 🏛 ISCA 📚 5.1K cites 9 years ago

R.I.P. 👻 Ghosted

Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning

Hoo-Chang Shin, Holger R. Roth, ... (+7 more)

cs.CV 🏛 IEEE TMI 📚 4.9K cites 10 years ago

R.I.P. 👻 Ghosted

Explanation in Artificial Intelligence: Insights from the Social Sciences

Tim Miller

cs.AI 🏛 AI 📚 4.9K cites 8 years ago