Action Recognition Using Volumetric Motion Representations

November 19, 2019 · Entered Twilight · 🏛 arXiv.org

"Last commit was 7.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, LICENSE, README.md, __init__.py, austin, config.py, datasets.py, datasets_sysu.py, expand_npz.py, feature_manager.py, image processing.ipynb, models.py, ntu_rgb.py, ntu_rgb_utils.py, opengl_viewer, optical_flow.py, progress_meter.py, requirements.txt, runme.sh, save_images.py, sysu_dataset.py, train.py

Authors Michael Peven, Gregory D. Hager, Austin Reiter arXiv ID 1911.08511 Category cs.CV: Computer Vision Cross-listed eess.IV Citations 0 Venue arXiv.org Repository https://github.com/mpeven/ntu_rgb ⭐ 15 Last Checked 2 months ago

Abstract

Traditional action recognition models are constructed around the paradigm of 2D perspective imagery. Though sophisticated time-series models have pushed the field forward, much of the information is still not exploited by confining the domain to 2D. In this work, we introduce a novel representation of motion as a voxelized 3D vector field and demonstrate how it can be used to improve performance of action recognition networks. This volumetric representation is a natural fit for 3D CNNs, and allows out-of-plane data augmentation techniques during training of these networks. Both the construction of this representation from RGB-D video and inference can be run in real time. We demonstrate superior results using this representation with our network design on the open-source NTU RGB+D dataset where it outperforms state-of-the-art on both of the defined evaluation metrics. Furthermore, we experimentally show how the out-of-plane augmentation techniques create viewpoint invariance and allow the model trained using this representation to generalize to unseen camera angles. Code is available here: https://github.com/mpeven/ntu_rgb.