Retrieval-Augmented Transformer for Image Captioning

July 26, 2022 ยท Declared Dead ยท ๐Ÿ› International Conference on Content-Based Multimedia Indexing

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara arXiv ID 2207.13162 Category cs.CV: Computer Vision Cross-listed cs.AI, cs.CL, cs.MM Citations 70 Venue International Conference on Content-Based Multimedia Indexing Last Checked 3 months ago
Abstract
Image captioning models aim at connecting Vision and Language by providing natural language descriptions of input images. In the past few years, the task has been tackled by learning parametric models and proposing visual feature extraction advancements or by modeling better multi-modal connections. In this paper, we investigate the development of an image captioning approach with a kNN memory, with which knowledge can be retrieved from an external corpus to aid the generation process. Our architecture combines a knowledge retriever based on visual similarities, a differentiable encoder, and a kNN-augmented attention layer to predict tokens based on the past context and on text retrieved from the external memory. Experimental results, conducted on the COCO dataset, demonstrate that employing an explicit external memory can aid the generation process and increase caption quality. Our work opens up new avenues for improving image captioning models at larger scale.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computer Vision

Died the same way โ€” ๐Ÿ‘ป Ghosted