Unsupervised Discovery of Multimodal Links in Multi-image, Multi-sentence Documents

April 16, 2019 ยท Entered Twilight ยท ๐Ÿ› Conference on Empirical Methods in Natural Language Processing

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"No code URL or promise found in abstract"
"Code repo scraped from project page (backfill)"

Evidence collected by the PWNC Scanner

Repo contents: LICENSE, README.md, bipartite_utils.py, data, eval_utils.py, image_feature_extract, image_utils.py, model_utils.py, paper_commands, requirements.txt, summary.png, text_utils.py, train_doc.py, training_utils.py, visualize_predictions_graph.py, visualize_predictions_html.py

Authors Jack Hessel, Lillian Lee, David Mimno arXiv ID 1904.07826 Category cs.CL: Computation & Language Cross-listed cs.CV Citations 31 Venue Conference on Empirical Methods in Natural Language Processing Repository https://github.com/jmhessel/multi-retrieval โญ 30 Last Checked 6 days ago
Abstract
Images and text co-occur constantly on the web, but explicit links between images and sentences (or other intra-document textual units) are often not present. We present algorithms that discover image-sentence relationships without relying on explicit multimodal annotation in training. We experiment on seven datasets of varying difficulty, ranging from documents consisting of groups of images captioned post hoc by crowdworkers to naturally-occurring user-generated multimodal documents. We find that a structured training objective based on identifying whether collections of images and sentences co-occur in documents can suffice to predict links between specific sentences and specific images within the same document at test time.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 8 years ago