Open-Ended Visual Question-Answering

October 09, 2016 ยท Entered Twilight ยท ๐Ÿ› arXiv.org

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"No code URL or promise found in abstract"
"Derived repo from GitHub Pages (backfill)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, README.md, bin, data, requirements.txt, setup.py, tests, vqa

Authors Issey Masuda, Santiago Pascual de la Puente, Xavier Giro-i-Nieto arXiv ID 1610.02692 Category cs.CL: Computation & Language Cross-listed cs.CV, cs.MM Citations 9 Venue arXiv.org Repository https://github.com/imatge-upc/vqa-2016-cvprw โญ 41 Last Checked 1 month ago
Abstract
This thesis report studies methods to solve Visual Question-Answering (VQA) tasks with a Deep Learning framework. As a preliminary step, we explore Long Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to tackle Question-Answering (text based). We then modify the previous model to accept an image as an input in addition to the question. For this purpose, we explore the VGG-16 and K-CNN convolutional neural networks to extract visual features from the image. These are merged with the word embedding or with a sentence embedding of the question to predict the answer. This work was successfully submitted to the Visual Question Answering Challenge 2016, where it achieved a 53,62% of accuracy in the test dataset. The developed software has followed the best programming practices and Python code style, providing a consistent baseline in Keras for different configurations.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 8 years ago