Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings
March 30, 2016 ยท Declared Dead ยท ๐ North American Chapter of the Association for Computational Linguistics
Repo contents: 3.5k_verse_gold_image_sense_annotations.csv, README.md, gold_all_final_lemma_image_sense_annotations.csv, motion_verbs.csv, non_motion_verbs.csv, sense_specific_search_engine_queries.csv, verse_visualness_labels.csv
Authors
Spandana Gella, Mirella Lapata, Frank Keller
arXiv ID
1603.09188
Category
cs.CL: Computation & Language
Cross-listed
cs.CV
Citations
54
Venue
North American Chapter of the Association for Computational Linguistics
Repository
https://github.com/spandanagella/verse
โญ 13
Last Checked
1 month ago
Abstract
We introduce a new task, visual sense disambiguation for verbs: given an image and a verb, assign the correct sense of the verb, i.e., the one that describes the action depicted in the image. Just as textual word sense disambiguation is useful for a wide range of NLP tasks, visual sense disambiguation can be useful for multimodal tasks such as image retrieval, image description, and text illustration. We introduce VerSe, a new dataset that augments existing multimodal datasets (COCO and TUHOI) with sense labels. We propose an unsupervised algorithm based on Lesk which performs visual sense disambiguation using textual, visual, or multimodal embeddings. We find that textual embeddings perform well when gold-standard textual annotations (object labels and image descriptions) are available, while multimodal embeddings perform well on unannotated images. We also verify our findings by using the textual and multimodal embeddings as features in a supervised setting and analyse the performance of visual sense disambiguation task. VerSe is made publicly available and can be downloaded at: https://github.com/spandanagella/verse.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted
Deep contextualized word representations
Died the same way โ ๐ฆด Skeleton Repo
R.I.P.
๐ฆด
Skeleton Repo
EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification
R.I.P.
๐ฆด
Skeleton Repo
Deep Learning for 3D Point Clouds: A Survey
R.I.P.
๐ฆด
Skeleton Repo
Adversarial Examples: Attacks and Defenses for Deep Learning
R.I.P.
๐ฆด
Skeleton Repo