Towards Good Practices for Missing Modality Robust Action Recognition
November 25, 2022 ยท Entered Twilight ยท ๐ AAAI Conference on Artificial Intelligence
Repo contents: LICENSE, README.md, figure, lib, missing_skeletons.txt, requirements.txt, train_val_actionmae_multigpu.py, train_val_actionmae_multigpu.sh, train_val_baseline_multigpu.py, train_val_baseline_multigpu.sh
Authors
Sangmin Woo, Sumin Lee, Yeonju Park, Muhammad Adi Nugroho, Changick Kim
arXiv ID
2211.13916
Category
cs.CV: Computer Vision
Cross-listed
cs.AI,
cs.LG
Citations
75
Venue
AAAI Conference on Artificial Intelligence
Repository
https://github.com/sangminwoo/ActionMAE
โญ 23
Last Checked
1 month ago
Abstract
Standard multi-modal models assume the use of the same modalities in training and inference stages. However, in practice, the environment in which multi-modal models operate may not satisfy such assumption. As such, their performances degrade drastically if any modality is missing in the inference stage. We ask: how can we train a model that is robust to missing modalities? This paper seeks a set of good practices for multi-modal action recognition, with a particular interest in circumstances where some modalities are not available at an inference time. First, we study how to effectively regularize the model during training (e.g., data augmentation). Second, we investigate on fusion methods for robustness to missing modalities: we find that transformer-based fusion shows better robustness for missing modality than summation or concatenation. Third, we propose a simple modular network, ActionMAE, which learns missing modality predictive coding by randomly dropping modality features and tries to reconstruct them with the remaining modality features. Coupling these good practices, we build a model that is not only effective in multi-modal action recognition but also robust to modality missing. Our model achieves the state-of-the-arts on multiple benchmarks and maintains competitive performances even in missing modality scenarios. Codes are available at https://github.com/sangminwoo/ActionMAE.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted