Efficient softmax approximation for GPUs
September 14, 2016 Β· Entered Twilight Β· π International Conference on Machine Learning
"Last commit was 8.0 years ago (β₯5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: CONTRIBUTING.md, LICENSE, PATENTS, README.md, data, train_big_lstm.lua, utils
Authors
Edouard Grave, Armand Joulin, Moustapha CissΓ©, David Grangier, HervΓ© JΓ©gou
arXiv ID
1609.04309
Category
cs.CL: Computation & Language
Cross-listed
cs.LG
Citations
290
Venue
International Conference on Machine Learning
Repository
https://github.com/facebookresearch/adaptive-softmax
β 396
Last Checked
1 month ago
Abstract
We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computation time. Our approach further reduces the computational time by exploiting the specificities of modern architectures and matrix-matrix vector operations, making it particularly suited for graphical processing units. Our experiments carried out on standard benchmarks, such as EuroParl and One Billion Word, show that our approach brings a large gain in efficiency over standard approximations while achieving an accuracy close to that of the full softmax. The code of our method is available at https://github.com/facebookresearch/adaptive-softmax.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Computation & Language
π
π
Old Age
π
π
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
π»
Ghosted
Language Models are Few-Shot Learners
R.I.P.
π»
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
π»
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
π»
Ghosted