Impact of Three-Dimensional Video Scalability on Multi-View Activity Recognition using Deep Learning

September 29, 2017 · Declared Dead · 🏛 ACM Multimedia

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Jun-Ho Choi, Manri Cheon, Min-Su Choi, Jong-Seok Lee arXiv ID 1709.10206 Category cs.MM: Multimedia Citations 4 Venue ACM Multimedia Last Checked 3 months ago

Abstract

Human activity recognition is one of the important research topics in computer vision and video understanding. It is often assumed that high quality video sequences are available for recognition. However, relaxing such a requirement and implementing robust recognition using videos having reduced data rates can achieve efficiency in storing and transmitting video data. Three-dimensional video scalability, which refers to the possibility of reducing spatial, temporal, and quality resolutions of videos, is an effective way for flexible representation and management of video data. In this paper, we investigate the impact of the video scalability on multi-view activity recognition. We employ both a spatiotemporal feature extraction-based method and a deep learning-based method using convolutional and recurrent neural networks. The recognition performance of the two methods is examined, along with in-depth analysis regarding how their performance vary with respect to various scalability combinations. In particular, we demonstrate that the deep learning-based method can achieve significantly improved robustness in comparison to the feature-based method. Furthermore, we investigate optimal scalability combinations with respect to bitrate in order to provide useful guidelines for an optimal operation policy in resource-constrained activity recognition systems.