Adaptively Aligned Image Captioning via Adaptive Attention Time
September 19, 2019 ยท Entered Twilight ยท ๐ Neural Information Processing Systems
"Last commit was 6.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: ADVANCED.md, LICENSE, README.md, data, dataloader.py, dataloaderraw.py, eval.py, eval_ensemble.py, eval_utils.py, misc, models, opts.py, scripts, test-best.sh, test-last.sh, train-aat.sh, train.py, vis
Authors
Lun Huang, Wenmin Wang, Yaxian Xia, Jie Chen
arXiv ID
1909.09060
Category
cs.CV: Computer Vision
Cross-listed
cs.CL
Citations
67
Venue
Neural Information Processing Systems
Repository
https://github.com/husthuaan/AAT
โญ 51
Last Checked
1 month ago
Abstract
Recent neural models for image captioning usually employ an encoder-decoder framework with an attention mechanism. However, the attention mechanism in such a framework aligns one single (attended) image feature vector to one caption word, assuming one-to-one mapping from source image regions and target caption words, which is never possible. In this paper, we propose a novel attention model, namely Adaptive Attention Time (AAT), to align the source and the target adaptively for image captioning. AAT allows the framework to learn how many attention steps to take to output a caption word at each decoding step. With AAT, an image region can be mapped to an arbitrary number of caption words while a caption word can also attend to an arbitrary number of image regions. AAT is deterministic and differentiable, and doesn't introduce any noise to the parameter gradients. In this paper, we empirically show that AAT improves over state-of-the-art methods on the task of image captioning. Code is available at https://github.com/husthuaan/AAT.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted