Test-time Adaptation vs. Training-time Generalization: A Case Study in Human Instance Segmentation using Keypoints Estimation

December 12, 2022 · Declared Dead · 🏛 2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Kambiz Azarian, Debasmit Das, Hyojin Park, Fatih Porikli arXiv ID 2212.06242 Category cs.CV: Computer Vision Cross-listed cs.LG Citations 5 Venue 2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) Last Checked 3 months ago

Abstract

We consider the problem of improving the human instance segmentation mask quality for a given test image using keypoints estimation. We compare two alternative approaches. The first approach is a test-time adaptation (TTA) method, where we allow test-time modification of the segmentation network's weights using a single unlabeled test image. In this approach, we do not assume test-time access to the labeled source dataset. More specifically, our TTA method consists of using the keypoints estimates as pseudo labels and backpropagating them to adjust the backbone weights. The second approach is a training-time generalization (TTG) method, where we permit offline access to the labeled source dataset but not the test-time modification of weights. Furthermore, we do not assume the availability of any images from or knowledge about the target domain. Our TTG method consists of augmenting the backbone features with those generated by the keypoints head and feeding the aggregate vector to the mask head. Through a comprehensive set of ablations, we evaluate both approaches and identify several factors limiting the TTA gains. In particular, we show that in the absence of a significant domain shift, TTA may hurt and TTG show only a small gain in performance, whereas for a large domain shift, TTA gains are smaller and dependent on the heuristics used, while TTG gains are larger and robust to architectural choices.