Mixture Content Selection for Diverse Sequence Generation

September 04, 2019 ยท Entered Twilight ยท ๐Ÿ› Conference on Empirical Methods in Natural Language Processing

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"Last commit was 6.0 years ago (โ‰ฅ5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, CNNDM_data_loader.py, Dataset_details.md, LICENSE, QG_data_loader.py, README.md, _imgs, build_utils.py, configs.py, evaluate.py, layers, models.py, requirements.txt, train.py, utils

Authors Jaemin Cho, Minjoon Seo, Hannaneh Hajishirzi arXiv ID 1909.01953 Category cs.CL: Computation & Language Citations 63 Venue Conference on Empirical Methods in Natural Language Processing Repository https://github.com/clovaai/FocusSeq2Seq โญ 113 Last Checked 1 month ago
Abstract
Generating diverse sequences is important in many NLP applications such as question generation or summarization that exhibit semantically one-to-many relationships between source and the target sequences. We present a method to explicitly separate diversification from generation using a general plug-and-play module (called SELECTOR) that wraps around and guides an existing encoder-decoder model. The diversification stage uses a mixture of experts to sample different binary masks on the source sequence for diverse content selection. The generation stage uses a standard encoder-decoder model given each selected content from the source sequence. Due to the non-differentiable nature of discrete sampling and the lack of ground truth labels for binary mask, we leverage a proxy for ground truth mask and adopt stochastic hard-EM for training. In question generation (SQuAD) and abstractive summarization (CNN-DM), our method demonstrates significant improvements in accuracy, diversity and training efficiency, including state-of-the-art top-1 accuracy in both datasets, 6% gain in top-5 accuracy, and 3.7 times faster training over a state of the art model. Our code is publicly available at https://github.com/clovaai/FocusSeq2Seq.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 8 years ago