Segment Anything Across Shots: A Method and Benchmark

November 17, 2025 · 🏛 arXiv.org

"No code URL or promise found in abstract"
"Code repo scraped from project page (backfill)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, CODE_OF_CONDUCT.md, LICENSE, LICENSE_cctorch, MANIFEST.in, README.md, assets, demo, pyproject.toml, sam2, setup.py, tools

Authors Hengrui Hu, Kaining Ying, Henghui Ding arXiv ID 2511.13715 Category cs.CV: Computer Vision Citations 0 Venue arXiv.org Repository https://github.com/FudanCVL/SAAS ⭐ 27 Last Checked 5 days ago

Abstract

This work focuses on multi-shot semi-supervised video object segmentation (MVOS), which aims at segmenting the target object indicated by an initial mask throughout a video with multiple shots. The existing VOS methods mainly focus on single-shot videos and struggle with shot discontinuities, thereby limiting their real-world applicability. We propose a transition mimicking data augmentation strategy (TMA) which enables cross-shot generalization with single-shot data to alleviate the severe annotated multi-shot data sparsity, and the Segment Anything Across Shots (SAAS) model, which can detect and comprehend shot transitions effectively. To support evaluation and future study in MVOS, we introduce Cut-VOS, a new MVOS benchmark with dense mask annotations, diverse object categories, and high-frequency transitions. Extensive experiments on YouMVOS and Cut-VOS demonstrate that the proposed SAAS achieves state-of-the-art performance by effectively mimicking, understanding, and segmenting across complex transitions. The code and datasets are released at https://henghuiding.com/SAAS/.