Attention on Attention for Image Captioning

August 19, 2019 ยท Entered Twilight ยท ๐Ÿ› IEEE International Conference on Computer Vision

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"Last commit was 6.0 years ago (โ‰ฅ5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitmodules, ADVANCED.md, LICENSE, README.md, cider, coco-caption, data, dataloader.py, dataloaderraw.py, eval.py, eval_ensemble.py, eval_utils.py, misc, models, opts.py, scripts, test-best.sh, test-last.sh, train-wo-refining.sh, train.py, train.sh, vis

Authors Lun Huang, Wenmin Wang, Jie Chen, Xiao-Yong Wei arXiv ID 1908.06954 Category cs.CV: Computer Vision Citations 974 Venue IEEE International Conference on Computer Vision Repository https://github.com/husthuaan/AoANet โญ 339 Last Checked 1 month ago
Abstract
Attention mechanisms are widely used in current encoder/decoder frameworks of image captioning, where a weighted average on encoded vectors is generated at each time step to guide the caption decoding process. However, the decoder has little idea of whether or how well the attended vector and the given attention query are related, which could make the decoder give misled results. In this paper, we propose an Attention on Attention (AoA) module, which extends the conventional attention mechanisms to determine the relevance between attention results and queries. AoA first generates an information vector and an attention gate using the attention result and the current context, then adds another attention by applying element-wise multiplication to them and finally obtains the attended information, the expected useful knowledge. We apply AoA to both the encoder and the decoder of our image captioning model, which we name as AoA Network (AoANet). Experiments show that AoANet outperforms all previously published methods and achieves a new state-of-the-art performance of 129.8 CIDEr-D score on MS COCO Karpathy offline test split and 129.6 CIDEr-D (C40) score on the official online testing server. Code is available at https://github.com/husthuaan/AoANet.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computer Vision