Transductive Zero-Shot Action Recognition by Word-Vector Embedding

November 13, 2015 · Declared Dead · 🏛 International Journal of Computer Vision

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Xun Xu, Timothy Hospedales, Shaogang Gong arXiv ID 1511.04458 Category cs.CV: Computer Vision Citations 173 Venue International Journal of Computer Vision Last Checked 4 months ago

Abstract

The number of categories for action recognition is growing rapidly and it has become increasingly hard to label sufficient training data for learning conventional models for all categories. Instead of collecting ever more data and labelling them exhaustively for all categories, an attractive alternative approach is zero-shot learning" (ZSL). To that end, in this study we construct a mapping between visual features and a semantic descriptor of each action category, allowing new categories to be recognised in the absence of any visual training data. Existing ZSL studies focus primarily on still images, and attribute-based semantic representations. In this work, we explore word-vectors as the shared semantic space to embed videos and category labels for ZSL action recognition. This is a more challenging problem than existing ZSL of still images and/or attributes, because the mapping between video spacetime features of actions and the semantic space is more complex and harder to learn for the purpose of generalising over any cross-category domain shift. To solve this generalisation problem in ZSL action recognition, we investigate a series of synergistic strategies to improve upon the standard ZSL pipeline. Most of these strategies are transductive in nature which means access to testing data in the training phase.