Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization
November 18, 2022 ยท Entered Twilight ยท ๐ IEEE Workshop/Winter Conference on Applications of Computer Vision
Repo contents: .gitignore, LICENSE, README.md, attention.py, configs, custom_callbacks.py, datasets.py, environment.yml, evaluation.py, knapsack.py, lit_models, main_ablations.py, main_y8.py, modules.py, run_ablation.sh, run_y8.sh, utils.py, vsum_tools.py
Authors
Zongshang Pang, Yuta Nakashima, Mayu Otani, Hajime Nagahara
arXiv ID
2211.10056
Category
cs.CV: Computer Vision
Citations
17
Venue
IEEE Workshop/Winter Conference on Applications of Computer Vision
Repository
https://github.com/pangzss/pytorch-CTVSUM
โญ 21
Last Checked
1 month ago
Abstract
Video summarization aims to select the most informative subset of frames in a video to facilitate efficient video browsing. Unsupervised methods usually rely on heuristic training objectives such as diversity and representativeness. However, such methods need to bootstrap the online-generated summaries to compute the objectives for importance score regression. We consider such a pipeline inefficient and seek to directly quantify the frame-level importance with the help of contrastive losses in the representation learning literature. Leveraging the contrastive losses, we propose three metrics featuring a desirable key frame: local dissimilarity, global consistency, and uniqueness. With features pre-trained on the image classification task, the metrics can already yield high-quality importance scores, demonstrating competitive or better performance than past heavily-trained methods. We show that by refining the pre-trained features with a lightweight contrastively learned projection module, the frame-level importance scores can be further improved, and the model can also leverage a large number of random videos and generalize to test videos with decent performance. Code available at https://github.com/pangzss/pytorch-CTVSUM.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted