Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation
August 17, 2016 ยท Declared Dead ยท ๐ ACM Multimedia
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Rakshith Shetty, Jorma Laaksonen
arXiv ID
1608.04959
Category
cs.CV: Computer Vision
Citations
94
Venue
ACM Multimedia
Last Checked
3 months ago
Abstract
We present our submission to the Microsoft Video to Language Challenge of generating short captions describing videos in the challenge dataset. Our model is based on the encoder--decoder pipeline, popular in image and video captioning systems. We propose to utilize two different kinds of video features, one to capture the video content in terms of objects and attributes, and the other to capture the motion and action information. Using these diverse features we train models specializing in two separate input sub-domains. We then train an evaluator model which is used to pick the best caption from the pool of candidates generated by these domain expert models. We argue that this approach is better suited for the current video captioning task, compared to using a single model, due to the diversity in the dataset. Efficacy of our method is proven by the fact that it was rated best in MSR Video to Language Challenge, as per human evaluation. Additionally, we were ranked second in the automatic evaluation metrics based table.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted
Rethinking the Inception Architecture for Computer Vision
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted