Learning Human Motion Models for Long-term Predictions

April 10, 2017 · Declared Dead · 🏛 International Conference on 3D Vision

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Partha Ghosh, Jie Song, Emre Aksan, Otmar Hilliges arXiv ID 1704.02827 Category cs.CV: Computer Vision Citations 249 Venue International Conference on 3D Vision Last Checked 3 months ago

Abstract

We propose a new architecture for the learning of predictive spatio-temporal motion models from data alone. Our approach, dubbed the Dropout Autoencoder LSTM, is capable of synthesizing natural looking motion sequences over long time horizons without catastrophic drift or motion degradation. The model consists of two components, a 3-layer recurrent neural network to model temporal aspects and a novel auto-encoder that is trained to implicitly recover the spatial structure of the human skeleton via randomly removing information about joints during training time. This Dropout Autoencoder (D-AE) is then used to filter each predicted pose of the LSTM, reducing accumulation of error and hence drift over time. Furthermore, we propose new evaluation protocols to assess the quality of synthetic motion sequences even for which no ground truth data exists. The proposed protocols can be used to assess generated sequences of arbitrary length. Finally, we evaluate our proposed method on two of the largest motion-capture datasets available to date and show that our model outperforms the state-of-the-art on a variety of actions, including cyclic and acyclic motion, and that it can produce natural looking sequences over longer time horizons than previous methods.