OmniVec: Learning robust representations with cross modal sharing
November 07, 2023 ยท Declared Dead ยท ๐ IEEE Workshop/Winter Conference on Applications of Computer Vision
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Siddharth Srivastava, Gaurav Sharma
arXiv ID
2311.05709
Category
cs.CV: Computer Vision
Cross-listed
cs.LG
Citations
86
Venue
IEEE Workshop/Winter Conference on Applications of Computer Vision
Last Checked
3 months ago
Abstract
Majority of research in learning based methods has been towards designing and training networks for specific tasks. However, many of the learning based tasks, across modalities, share commonalities and could be potentially tackled in a joint framework. We present an approach in such direction, to learn multiple tasks, in multiple modalities, with a unified architecture. The proposed network is composed of task specific encoders, a common trunk in the middle, followed by task specific prediction heads. We first pre-train it by self-supervised masked training, followed by sequential training for the different tasks. We train the network on all major modalities, e.g.\ visual, audio, text and 3D, and report results on $22$ diverse and challenging public benchmarks. We demonstrate empirically that, using a joint network to train across modalities leads to meaningful information sharing and this allows us to achieve state-of-the-art results on most of the benchmarks. We also show generalization of the trained network on cross-modal tasks as well as unseen datasets and tasks.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted
Rethinking the Inception Architecture for Computer Vision
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted