Mean Box Pooling: A Rich Image Representation and Output Embedding for the Visual Madlibs Task

August 09, 2016 ยท Declared Dead ยท ๐Ÿ› British Machine Vision Conference

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Ashkan Mokarian, Mateusz Malinowski, Mario Fritz arXiv ID 1608.02717 Category cs.CV: Computer Vision Cross-listed cs.AI, cs.CL, cs.LG Citations 5 Venue British Machine Vision Conference Last Checked 3 months ago
Abstract
We present Mean Box Pooling, a novel visual representation that pools over CNN representations of a large number, highly overlapping object proposals. We show that such representation together with nCCA, a successful multimodal embedding technique, achieves state-of-the-art performance on the Visual Madlibs task. Moreover, inspired by the nCCA's objective function, we extend classical CNN+LSTM approach to train the network by directly maximizing the similarity between the internal representation of the deep learning architecture and candidate answers. Again, such approach achieves a significant improvement over the prior work that also uses CNN+LSTM approach on Visual Madlibs.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computer Vision

Died the same way โ€” ๐Ÿ‘ป Ghosted