Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing
March 25, 2019 Β· Entered Twilight Β· π North American Chapter of the Association for Computational Linguistics
"Last commit was 5.0 years ago (β₯5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: .gitignore, README.md, dialog, figs, language_model, plot, semi
Authors
Hao Fu, Chunyuan Li, Xiaodong Liu, Jianfeng Gao, Asli Celikyilmaz, Lawrence Carin
arXiv ID
1903.10145
Category
cs.LG: Machine Learning
Cross-listed
cs.AI,
cs.CL,
cs.CV,
stat.ML
Citations
417
Venue
North American Chapter of the Association for Computational Linguistics
Repository
https://github.com/haofuml/cyclical_annealing
β 198
Last Checked
1 month ago
Abstract
Variational autoencoders (VAEs) with an auto-regressive decoder have been applied for many natural language processing (NLP) tasks. The VAE objective consists of two terms, (i) reconstruction and (ii) KL regularization, balanced by a weighting hyper-parameter Ξ². One notorious training difficulty is that the KL term tends to vanish. In this paper we study scheduling schemes for Ξ², and show that KL vanishing is caused by the lack of good latent codes in training the decoder at the beginning of optimization. To remedy this, we propose a cyclical annealing schedule, which repeats the process of increasing Ξ²multiple times. This new procedure allows the progressive learning of more meaningful latent codes, by leveraging the informative representations of previous cycles as warm re-starts. The effectiveness of cyclical annealing is validated on a broad range of NLP tasks, including language modeling, dialog response generation and unsupervised language pre-training.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Machine Learning
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
π»
Ghosted
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
R.I.P.
π»
Ghosted
Semi-Supervised Classification with Graph Convolutional Networks
R.I.P.
π»
Ghosted
Proximal Policy Optimization Algorithms
R.I.P.
π»
Ghosted