XLNet: Generalized Autoregressive Pretraining for Language Understanding

June 19, 2019 ยท Entered Twilight ยท ๐Ÿ› Neural Information Processing Systems

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"Last commit was 6.0 years ago (โ‰ฅ5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, LICENSE, README.md, __init__.py, classifier_utils.py, data_utils.py, function_builder.py, gpu_utils.py, misc, model_utils.py, modeling.py, notebooks, prepro_utils.py, run_classifier.py, run_race.py, run_squad.py, scripts, squad_utils.py, tpu_estimator.py, train.py, train_gpu.py, xlnet.py

Authors Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le arXiv ID 1906.08237 Category cs.CL: Computation & Language Cross-listed cs.LG Citations 9.2K Venue Neural Information Processing Systems Repository https://github.com/zihangdai/xlnet โญ 6175 Last Checked 1 month ago
Abstract
With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, under comparable experiment settings, XLNet outperforms BERT on 20 tasks, often by a large margin, including question answering, natural language inference, sentiment analysis, and document ranking.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 8 years ago