A Better Way to Attend: Attention with Trees for Video Question Answering
September 05, 2019 ยท Entered Twilight ยท ๐ IEEE Transactions on Image Processing
"Last commit was 6.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: README.md, overview.pdf, overview.png, r.png, treeMN-non-hierachical, treeMN
Authors
Hongyang Xue, Wenqing Chu, Zhou Zhao, Deng Cai
arXiv ID
1909.02218
Category
cs.CV: Computer Vision
Cross-listed
cs.CL,
cs.LG
Citations
33
Venue
IEEE Transactions on Image Processing
Repository
https://github.com/ZJULearning/TreeAttention
โญ 25
Last Checked
1 month ago
Abstract
We propose a new attention model for video question answering. The main idea of the attention models is to locate on the most informative parts of the visual data. The attention mechanisms are quite popular these days. However, most existing visual attention mechanisms regard the question as a whole. They ignore the word-level semantics where each word can have different attentions and some words need no attention. Neither do they consider the semantic structure of the sentences. Although the Extended Soft Attention (E-SA) model for video question answering leverages the word-level attention, it performs poorly on long question sentences. In this paper, we propose the heterogeneous tree-structured memory network (HTreeMN) for video question answering. Our proposed approach is based upon the syntax parse trees of the question sentences. The HTreeMN treats the words differently where the \textit{visual} words are processed with an attention module and the \textit{verbal} ones not. It also utilizes the semantic structure of the sentences by combining the neighbors based on the recursive structure of the parse trees. The understandings of the words and the videos are propagated and merged from leaves to the root. Furthermore, we build a hierarchical attention mechanism to distill the attended features. We evaluate our approach on two datasets. The experimental results show the superiority of our HTreeMN model over the other attention models especially on complex questions. Our code is available on github. Our code is available at https://github.com/ZJULearning/TreeAttention
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted