ChatVTG: Video Temporal Grounding via Chat with Video Dialogue Large Language Models
October 01, 2024 ยท Declared Dead ยท ๐ 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Mengxue Qu, Xiaodong Chen, Wu Liu, Alicia Li, Yao Zhao
arXiv ID
2410.12813
Category
cs.MM: Multimedia
Cross-listed
cs.CV
Citations
40
Venue
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Last Checked
1 month ago
Abstract
Video Temporal Grounding (VTG) aims to ground specific segments within an untrimmed video corresponding to the given natural language query. Existing VTG methods largely depend on supervised learning and extensive annotated data, which is labor-intensive and prone to human biases. To address these challenges, we present ChatVTG, a novel approach that utilizes Video Dialogue Large Language Models (LLMs) for zero-shot video temporal grounding. Our ChatVTG leverages Video Dialogue LLMs to generate multi-granularity segment captions and matches these captions with the given query for coarse temporal grounding, circumventing the need for paired annotation data. Furthermore, to obtain more precise temporal grounding results, we employ moment refinement for fine-grained caption proposals. Extensive experiments on three mainstream VTG datasets, including Charades-STA, ActivityNet-Captions, and TACoS, demonstrate the effectiveness of ChatVTG. Our ChatVTG surpasses the performance of current zero-shot methods.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Multimedia
R.I.P.
๐ป
Ghosted
๐
๐
Old Age
Quality Assessment of In-the-Wild Videos
R.I.P.
๐ป
Ghosted
Viewport-Adaptive Navigable 360-Degree Video Delivery
R.I.P.
๐ป
Ghosted
A Comprehensive Survey on Cross-modal Retrieval
R.I.P.
๐ป
Ghosted
An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges
R.I.P.
๐ป
Ghosted
A Convolutional Neural Network Approach for Post-Processing in HEVC Intra Coding
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted