R.I.P.
π»
Ghosted
Z-Forcing: Training Stochastic Recurrent Networks
November 15, 2017 Β· Declared Dead Β· π Neural Information Processing Systems
Authors
Anirudh Goyal, Alessandro Sordoni, Marc-Alexandre CΓ΄tΓ©, Nan Rosemary Ke, Yoshua Bengio
arXiv ID
1711.05411
Category
stat.ML: Machine Learning (Stat)
Cross-listed
cs.LG
Citations
194
Venue
Neural Information Processing Systems
Repository
https://github.com/anirudh9119/zforcing_nips17}
Last Checked
1 month ago
Abstract
Many efforts have been devoted to training generative latent variable models with autoregressive decoders, such as recurrent neural networks (RNN). Stochastic recurrent models have been successful in capturing the variability observed in natural sequential data such as speech. We unify successful ideas from recently proposed architectures into a stochastic recurrent model: each step in the sequence is associated with a latent variable that is used to condition the recurrent dynamics for future steps. Training is performed with amortized variational inference where the approximate posterior is augmented with a RNN that runs backward through the sequence. In addition to maximizing the variational lower bound, we ease training of the latent variables by adding an auxiliary cost which forces them to reconstruct the state of the backward recurrent network. This provides the latent variables with a task-independent objective that enhances the performance of the overall model. We found this strategy to perform better than alternative approaches such as KL annealing. Although being conceptually simple, our model achieves state-of-the-art results on standard speech benchmarks such as TIMIT and Blizzard and competitive performance on sequential MNIST. Finally, we apply our model to language modeling on the IMDB dataset where the auxiliary cost helps in learning interpretable latent variables. Source Code: \url{https://github.com/anirudh9119/zforcing_nips17}
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Machine Learning (Stat)
R.I.P.
π»
Ghosted
Distilling the Knowledge in a Neural Network
R.I.P.
π»
Ghosted
Layer Normalization
R.I.P.
π»
Ghosted
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
R.I.P.
π»
Ghosted
Domain-Adversarial Training of Neural Networks
R.I.P.
π»
Ghosted
Deep Learning with Differential Privacy
Died the same way β π 404 Not Found
R.I.P.
π
404 Not Found
Deep High-Resolution Representation Learning for Visual Recognition
R.I.P.
π
404 Not Found
HuggingFace's Transformers: State-of-the-art Natural Language Processing
R.I.P.
π
404 Not Found
CCNet: Criss-Cross Attention for Semantic Segmentation
R.I.P.
π
404 Not Found