Direct Output Connection for a High-Rank Language Model
August 30, 2018 ยท Entered Twilight ยท ๐ Conference on Empirical Methods in Natural Language Processing
"Last commit was 7.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: LICENSE, NTT_LICENSE, README.md, cal_ppl.py, data.py, embed_regularize.py, finetune.py, generate.py, get_data.sh, locked_dropout.py, main.py, model.py, utils.py, weight_drop.py
Authors
Sho Takase, Jun Suzuki, Masaaki Nagata
arXiv ID
1808.10143
Category
cs.CL: Computation & Language
Citations
37
Venue
Conference on Empirical Methods in Natural Language Processing
Repository
https://github.com/nttcslab-nlp/doc_lm
โญ 12
Last Checked
1 month ago
Abstract
This paper proposes a state-of-the-art recurrent neural network (RNN) language model that combines probability distributions computed not only from a final RNN layer but also from middle layers. Our proposed method raises the expressive power of a language model based on the matrix factorization interpretation of language modeling introduced by Yang et al. (2018). The proposed method improves the current state-of-the-art language model and achieves the best score on the Penn Treebank and WikiText-2, which are the standard benchmark datasets. Moreover, we indicate our proposed method contributes to two application tasks: machine translation and headline generation. Our code is publicly available at: https://github.com/nttcslab-nlp/doc_lm.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted