Segment Anything Across Shots: A Method and Benchmark

November 17, 2025 · 🏛 arXiv.org

✨ This Paper Lives!
Code has been found and verified.

"No code URL or promise found in abstract"
"Code repo scraped from project page (backfill)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, CODE_OF_CONDUCT.md, LICENSE, LICENSE_cctorch, MANIFEST.in, README.md, assets, demo, pyproject.toml, sam2, setup.py, tools

Authors Hengrui Hu, Kaining Ying, Henghui Ding arXiv ID 2511.13715 Category cs.CV: Computer Vision Citations 0 Venue arXiv.org Repository https://github.com/FudanCVL/SAAS ⭐ 27 Last Checked 5 days ago
Abstract
This work focuses on multi-shot semi-supervised video object segmentation (MVOS), which aims at segmenting the target object indicated by an initial mask throughout a video with multiple shots. The existing VOS methods mainly focus on single-shot videos and struggle with shot discontinuities, thereby limiting their real-world applicability. We propose a transition mimicking data augmentation strategy (TMA) which enables cross-shot generalization with single-shot data to alleviate the severe annotated multi-shot data sparsity, and the Segment Anything Across Shots (SAAS) model, which can detect and comprehend shot transitions effectively. To support evaluation and future study in MVOS, we introduce Cut-VOS, a new MVOS benchmark with dense mask annotations, diverse object categories, and high-frequency transitions. Extensive experiments on YouMVOS and Cut-VOS demonstrate that the proposed SAAS achieves state-of-the-art performance by effectively mimicking, understanding, and segmenting across complex transitions. The code and datasets are released at https://henghuiding.com/SAAS/.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

📜 Similar Papers

In the same crypt — Computer Vision