Weakly Supervised Video Summarization by Hierarchical Reinforcement Learning

January 12, 2020 · Declared Dead · 🏛 ACM Multimedia Asia

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Yiyan Chen, Li Tao, Xueting Wang, Toshihiko Yamasaki arXiv ID 2001.05864 Category cs.CV: Computer Vision Cross-listed cs.LG, cs.MM Citations 57 Venue ACM Multimedia Asia Last Checked 3 months ago

Abstract

Conventional video summarization approaches based on reinforcement learning have the problem that the reward can only be received after the whole summary is generated. Such kind of reward is sparse and it makes reinforcement learning hard to converge. Another problem is that labelling each frame is tedious and costly, which usually prohibits the construction of large-scale datasets. To solve these problems, we propose a weakly supervised hierarchical reinforcement learning framework, which decomposes the whole task into several subtasks to enhance the summarization quality. This framework consists of a manager network and a worker network. For each subtask, the manager is trained to set a subgoal only by a task-level binary label, which requires much fewer labels than conventional approaches. With the guide of the subgoal, the worker predicts the importance scores for video frames in the subtask by policy gradient according to both global reward and innovative defined sub-rewards to overcome the sparse problem. Experiments on two benchmark datasets show that our proposal has achieved the best performance, even better than supervised approaches.