Listen to Look: Action Recognition by Previewing Audio

December 10, 2019 ยท Entered Twilight ยท ๐Ÿ› Computer Vision and Pattern Recognition

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"No code URL or promise found in abstract"
"Code repo scraped from project page (backfill)"

Evidence collected by the PWNC Scanner

Repo contents: CODE_OF_CONDUCT.md, CONTRIBUTING.md, LICENSE, README.md, data.py, listen_to_look_single_modality, listen_to_look_teaser.png, main.py, models, opts.py, train.py, utils, validate.py

Authors Ruohan Gao, Tae-Hyun Oh, Kristen Grauman, Lorenzo Torresani arXiv ID 1912.04487 Category cs.CV: Computer Vision Cross-listed cs.LG, cs.SD, eess.AS Citations 284 Venue Computer Vision and Pattern Recognition Repository https://github.com/facebookresearch/Listen-to-Look โญ 130 Last Checked 6 days ago
Abstract
In the face of the video data deluge, today's expensive clip-level classifiers are increasingly impractical. We propose a framework for efficient action recognition in untrimmed video that uses audio as a preview mechanism to eliminate both short-term and long-term visual redundancies. First, we devise an ImgAud2Vid framework that hallucinates clip-level features by distilling from lighter modalities---a single frame and its accompanying audio---reducing short-term temporal redundancy for efficient clip-level recognition. Second, building on ImgAud2Vid, we further propose ImgAud-Skimming, an attention-based long short-term memory network that iteratively selects useful moments in untrimmed videos, reducing long-term temporal redundancy for efficient video-level recognition. Extensive experiments on four action recognition datasets demonstrate that our method achieves the state-of-the-art in terms of both recognition accuracy and speed.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computer Vision