Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing

March 25, 2019 Β· Entered Twilight Β· πŸ› North American Chapter of the Association for Computational Linguistics

πŸŒ… TWILIGHT: Old Age
Predates the code-sharing era β€” a pioneer of its time

"Last commit was 5.0 years ago (β‰₯5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, README.md, dialog, figs, language_model, plot, semi

Authors Hao Fu, Chunyuan Li, Xiaodong Liu, Jianfeng Gao, Asli Celikyilmaz, Lawrence Carin arXiv ID 1903.10145 Category cs.LG: Machine Learning Cross-listed cs.AI, cs.CL, cs.CV, stat.ML Citations 417 Venue North American Chapter of the Association for Computational Linguistics Repository https://github.com/haofuml/cyclical_annealing ⭐ 198 Last Checked 1 month ago
Abstract
Variational autoencoders (VAEs) with an auto-regressive decoder have been applied for many natural language processing (NLP) tasks. The VAE objective consists of two terms, (i) reconstruction and (ii) KL regularization, balanced by a weighting hyper-parameter Ξ². One notorious training difficulty is that the KL term tends to vanish. In this paper we study scheduling schemes for Ξ², and show that KL vanishing is caused by the lack of good latent codes in training the decoder at the beginning of optimization. To remedy this, we propose a cyclical annealing schedule, which repeats the process of increasing Ξ²multiple times. This new procedure allows the progressive learning of more meaningful latent codes, by leveraging the informative representations of previous cycles as warm re-starts. The effectiveness of cyclical annealing is validated on a broad range of NLP tasks, including language modeling, dialog response generation and unsupervised language pre-training.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Machine Learning