Learning Light-Weight Translation Models from Deep Transformer

December 27, 2020 ยท Entered Twilight ยท ๐Ÿ› AAAI Conference on Artificial Intelligence

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"Last commit was 5.0 years ago (โ‰ฅ5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .DS_Store, .gitignore, CONTRIBUTING.md, LICENSE, PATENTS, README.md, docs, eval_lm.py, examples, extract.py, fairseq.gif, fairseq, fairseq_cli, fairseq_logo.png, generate.py, group_permutation_train.sh, interactive.py, preprocess.py, preprocess.sh, rerank.py, score.py, scripts, setup.py, skipping_sublayer_train.sh, tests, train.py, translate.sh

Authors Bei Li, Ziyang Wang, Hui Liu, Quan Du, Tong Xiao, Chunliang Zhang, Jingbo Zhu arXiv ID 2012.13866 Category cs.CL: Computation & Language Citations 43 Venue AAAI Conference on Artificial Intelligence Repository https://github.com/libeineu/GPKD โญ 15 Last Checked 1 month ago
Abstract
Recently, deep models have shown tremendous improvements in neural machine translation (NMT). However, systems of this kind are computationally expensive and memory intensive. In this paper, we take a natural step towards learning strong but light-weight NMT systems. We proposed a novel group-permutation based knowledge distillation approach to compressing the deep Transformer model into a shallow model. The experimental results on several benchmarks validate the effectiveness of our method. Our compressed model is 8X shallower than the deep model, with almost no loss in BLEU. To further enhance the teacher model, we present a Skipping Sub-Layer method to randomly omit sub-layers to introduce perturbation into training, which achieves a BLEU score of 30.63 on English-German newstest2014. The code is publicly available at https://github.com/libeineu/GPKD.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 8 years ago