Exploiting Spatial Invariance for Scalable Unsupervised Object Tracking

November 20, 2019 · Entered Twilight · 🏛 AAAI Conference on Artificial Intelligence

"Last commit was 5.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, LICENSE, README.md, beluga_requirements.txt, download_atari_data.sh, install.sh, requirements.txt, setup.py, silot, slurm_build_local_env.sh

Authors Eric Crawford, Joelle Pineau arXiv ID 1911.09033 Category cs.LG: Machine Learning Cross-listed cs.CV, stat.ML Citations 68 Venue AAAI Conference on Artificial Intelligence Repository https://github.com/e2crawfo/silot ⭐ 13 Last Checked 1 month ago

Abstract

The ability to detect and track objects in the visual world is a crucial skill for any intelligent agent, as it is a necessary precursor to any object-level reasoning process. Moreover, it is important that agents learn to track objects without supervision (i.e. without access to annotated training videos) since this will allow agents to begin operating in new environments with minimal human assistance. The task of learning to discover and track objects in videos, which we call \textit{unsupervised object tracking}, has grown in prominence in recent years; however, most architectures that address it still struggle to deal with large scenes containing many objects. In the current work, we propose an architecture that scales well to the large-scene, many-object setting by employing spatially invariant computations (convolutions and spatial attention) and representations (a spatially local object specification scheme). In a series of experiments, we demonstrate a number of attractive features of our architecture; most notably, that it outperforms competing methods at tracking objects in cluttered scenes with many objects, and that it can generalize well to videos that are larger and/or contain more objects than videos encountered during training.