CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering
November 07, 2022 ยท Entered Twilight ยท ๐ Conference on Empirical Methods in Natural Language Processing
"No code URL or promise found in abstract"
"Code repo scraped from project page (backfill)"
Evidence collected by the PWNC Scanner
Repo contents: Aloe-star, LICENSE, README.md, dataset, evaluation, tmp
Authors
Maitreya Patel, Tejas Gokhale, Chitta Baral, Yezhou Yang
arXiv ID
2211.03779
Category
cs.CV: Computer Vision
Cross-listed
cs.CL
Citations
13
Venue
Conference on Empirical Methods in Natural Language Processing
Repository
https://github.com/Maitreyapatel/CRIPP-VQA
โญ 9
Last Checked
6 days ago
Abstract
Videos often capture objects, their visible properties, their motion, and the interactions between different objects. Objects also have physical properties such as mass, which the imaging pipeline is unable to directly capture. However, these properties can be estimated by utilizing cues from relative object motion and the dynamics introduced by collisions. In this paper, we introduce CRIPP-VQA, a new video question answering dataset for reasoning about the implicit physical properties of objects in a scene. CRIPP-VQA contains videos of objects in motion, annotated with questions that involve counterfactual reasoning about the effect of actions, questions about planning in order to reach a goal, and descriptive questions about visible properties of objects. The CRIPP-VQA test set enables evaluation under several out-of-distribution settings -- videos with objects with masses, coefficients of friction, and initial velocities that are not observed in the training distribution. Our experiments reveal a surprising and significant performance gap in terms of answering questions about implicit properties (the focus of this paper) and explicit properties of objects (the focus of prior work).
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted