Seeing Eye to AI: Comparing Human Gaze and Model Attention in Video Memorability

November 26, 2023 · Declared Dead · 🏛 IEEE Workshop/Winter Conference on Applications of Computer Vision

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Prajneya Kumar, Eshika Khandelwal, Makarand Tapaswi, Vishnu Sreekumar arXiv ID 2311.16484 Category cs.CV: Computer Vision Citations 4 Venue IEEE Workshop/Winter Conference on Applications of Computer Vision Last Checked 3 months ago

Abstract

Understanding what makes a video memorable has important applications in advertising or education technology. Towards this goal, we investigate spatio-temporal attention mechanisms underlying video memorability. Different from previous works that fuse multiple features, we adopt a simple CNN+Transformer architecture that enables analysis of spatio-temporal attention while matching state-of-the-art (SoTA) performance on video memorability prediction. We compare model attention against human gaze fixations collected through a small-scale eye-tracking study where humans perform the video memory task. We uncover the following insights: (i) Quantitative saliency metrics show that our model, trained only to predict a memorability score, exhibits similar spatial attention patterns to human gaze, especially for more memorable videos. (ii) The model assigns greater importance to initial frames in a video, mimicking human attention patterns. (iii) Panoptic segmentation reveals that both (model and humans) assign a greater share of attention to things and less attention to stuff as compared to their occurrence probability.