Temporal Multimodal Fusion for Video Emotion Classification in the Wild

September 21, 2017 Β· Declared Dead Β· πŸ› International Conference on Multimodal Interaction

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Valentin Vielzeuf, StΓ©phane Pateux, FrΓ©dΓ©ric Jurie arXiv ID 1709.07200 Category cs.CV: Computer Vision Cross-listed cs.LG, cs.MM Citations 176 Venue International Conference on Multimodal Interaction Last Checked 4 months ago
Abstract
This paper addresses the question of emotion classification. The task consists in predicting emotion labels (taken among a set of possible labels) best describing the emotions contained in short video clips. Building on a standard framework -- lying in describing videos by audio and visual features used by a supervised classifier to infer the labels -- this paper investigates several novel directions. First of all, improved face descriptors based on 2D and 3D Convo-lutional Neural Networks are proposed. Second, the paper explores several fusion methods, temporal and multimodal, including a novel hierarchical method combining features and scores. In addition, we carefully reviewed the different stages of the pipeline and designed a CNN architecture adapted to the task; this is important as the size of the training set is small compared to the difficulty of the problem, making generalization difficult. The so-obtained model ranked 4th at the 2017 Emotion in the Wild challenge with the accuracy of 58.8 %.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Computer Vision

πŸŒ… πŸŒ… Old Age

Fast R-CNN

Ross Girshick

cs.CV πŸ› ICCV πŸ“š 27.7K cites 11 years ago

Died the same way β€” πŸ‘» Ghosted