π
π
Old Age
OpenGVL -- Benchmarking Visual Temporal Progress for Data Curation
September 22, 2025 Β· Declared Dead Β· π arXiv.org
Authors
PaweΕ Budzianowski, Emilia WiΕnios, MichaΕ Tyrolski, Gracjan GΓ³ral, Igor Kulakov, Viktor Petrenko, Krzysztof Walas
arXiv ID
2509.17321
Category
cs.RO: Robotics
Cross-listed
cs.CL
Citations
0
Venue
arXiv.org
Repository
https://github.com/budzianowski/opengvl}{OpenGVL}
Last Checked
2 months ago
Abstract
Data scarcity remains one of the most limiting factors in driving progress in robotics. However, the amount of available robotics data in the wild is growing exponentially, creating new opportunities for large-scale data utilization. Reliable temporal task completion prediction could help automatically annotate and curate this data at scale. The Generative Value Learning (GVL) approach was recently proposed, leveraging the knowledge embedded in vision-language models (VLMs) to predict task progress from visual observations. Building upon GVL, we propose OpenGVL, a comprehensive benchmark for estimating task progress across diverse challenging manipulation tasks involving both robotic and human embodiments. We evaluate the capabilities of publicly available open-source foundation models, showing that open-source model families significantly underperform closed-source counterparts, achieving only approximately $70\%$ of their performance on temporal progress prediction tasks. Furthermore, we demonstrate how OpenGVL can serve as a practical tool for automated data curation and filtering, enabling efficient quality assessment of large-scale robotics datasets. We release the benchmark along with the complete codebase at \href{github.com/budzianowski/opengvl}{OpenGVL}.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Robotics
R.I.P.
π»
Ghosted
ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras
R.I.P.
π»
Ghosted
VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator
R.I.P.
π»
Ghosted
ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM
R.I.P.
π»
Ghosted
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
R.I.P.
π»
Ghosted
Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age
Died the same way β π 404 Not Found
R.I.P.
π
404 Not Found
Deep High-Resolution Representation Learning for Visual Recognition
R.I.P.
π
404 Not Found
HuggingFace's Transformers: State-of-the-art Natural Language Processing
R.I.P.
π
404 Not Found
CCNet: Criss-Cross Attention for Semantic Segmentation
R.I.P.
π
404 Not Found