Learning Visual Reasoning Without Strong Priors
July 10, 2017 ยท Entered Twilight ยท ๐ International Conference on Machine Learning
"Last commit was 5.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: .gitignore, CLEVR_eval_with_q_type.py, LICENSE, README.md, img, requirements.txt, scripts, vr
Authors
Ethan Perez, Harm de Vries, Florian Strub, Vincent Dumoulin, Aaron Courville
arXiv ID
1707.03017
Category
cs.CV: Computer Vision
Cross-listed
cs.AI,
cs.CL,
stat.ML
Citations
63
Venue
International Conference on Machine Learning
Repository
https://github.com/ethanjperez/film
โญ 432
Last Checked
1 month ago
Abstract
Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an important step towards artificial general intelligence. This multi-modal task requires learning a question-dependent, structured reasoning process over images from language. Standard deep learning approaches tend to exploit biases in the data rather than learn this underlying structure, while leading methods learn to visually reason successfully but are hand-crafted for reasoning. We show that a general-purpose, Conditional Batch Normalization approach achieves state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4% error rate. We outperform the next best end-to-end method (4.5%) and even methods that use extra supervision (3.1%). We probe our model to shed light on how it reasons, showing it has learned a question-dependent, multi-step process. Previous work has operated under the assumption that visual reasoning calls for a specialized architecture, but we show that a general architecture with proper conditioning can learn to visually reason effectively.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted