Learning Visual Voice Activity Detection with an Automatically Annotated Dataset

September 23, 2020 Β· Declared Dead Β· πŸ› International Conference on Pattern Recognition

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Sylvain Guy, Stéphane Lathuilière, Pablo Mesejo, Radu Horaud arXiv ID 2009.11204 Category cs.CV: Computer Vision Citations 11 Venue International Conference on Pattern Recognition Last Checked 3 months ago
Abstract
Visual voice activity detection (V-VAD) uses visual features to predict whether a person is speaking or not. V-VAD is useful whenever audio VAD (A-VAD) is inefficient either because the acoustic signal is difficult to analyze or because it is simply missing. We propose two deep architectures for V-VAD, one based on facial landmarks and one based on optical flow. Moreover, available datasets, used for learning and for testing V-VAD, lack content variability. We introduce a novel methodology to automatically create and annotate very large datasets in-the-wild -- WildVVAD -- based on combining A-VAD with face detection and tracking. A thorough empirical evaluation shows the advantage of training the proposed deep V-VAD models with this dataset.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Computer Vision

Died the same way β€” πŸ‘» Ghosted