Very Deep Transformers for Neural Machine Translation

August 18, 2020 ยท Declared Dead ยท ๐Ÿ› arXiv.org

๐Ÿ“œ CAUSE OF DEATH: Death by README
Repo has only a README

Repo contents: README.md

Authors Xiaodong Liu, Kevin Duh, Liyuan Liu, Jianfeng Gao arXiv ID 2008.07772 Category cs.CL: Computation & Language Citations 110 Venue arXiv.org Repository https://github.com/namisan/exdeep-nmt โญ 32 Last Checked 1 month ago
Abstract
We explore the application of very deep Transformer models for Neural Machine Translation (NMT). Using a simple yet effective initialization technique that stabilizes training, we show that it is feasible to build standard Transformer-based models with up to 60 encoder layers and 12 decoder layers. These deep models outperform their baseline 6-layer counterparts by as much as 2.5 BLEU, and achieve new state-of-the-art benchmark results on WMT14 English-French (43.8 BLEU and 46.4 BLEU with back-translation) and WMT14 English-German (30.1 BLEU).The code and trained models will be publicly available at: https://github.com/namisan/exdeep-nmt.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 8 years ago

Died the same way โ€” ๐Ÿ“œ Death by README