🌅
🌅
Old Age
Neural CRF Model for Sentence Alignment in Text Simplification
May 05, 2020 · 🏛 Annual Meeting of the Association for Computational Linguistics
"No code URL or promise found in abstract"
"HuggingFace models found (backfill)"
Evidence collected by the PWNC Scanner
Authors
Chao Jiang, Mounica Maddela, Wuwei Lan, Yang Zhong, Wei Xu
arXiv ID
2005.02324
Category
cs.CL: Computation & Language
Citations
178
Venue
Annual Meeting of the Association for Computational Linguistics
Repository
https://huggingface.co/datasets/GEM/wiki_auto_asset_turk
Last Checked
9 days ago
Abstract
The success of a text simplification system heavily depends on the quality and quantity of complex-simple sentence pairs in the training corpus, which are extracted by aligning sentences between parallel articles. To evaluate and improve sentence alignment quality, we create two manually annotated sentence-aligned datasets from two commonly used text simplification corpora, Newsela and Wikipedia. We propose a novel neural CRF alignment model which not only leverages the sequential nature of sentences in parallel documents but also utilizes a neural sentence pair model to capture semantic similarity. Experiments demonstrate that our proposed approach outperforms all the previous work on monolingual sentence alignment task by more than 5 points in F1. We apply our CRF aligner to construct two new text simplification datasets, Newsela-Auto and Wiki-Auto, which are much larger and of better quality compared to the existing datasets. A Transformer-based seq2seq model trained on our datasets establishes a new state-of-the-art for text simplification in both automatic and human evaluation.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
📜 Similar Papers
In the same crypt — Computation & Language
🌅
🌅
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
🌅
🌅
Old Age
XLNet: Generalized Autoregressive Pretraining for Language Understanding
🔮
🔮
The Ethereal
Effective Approaches to Attention-based Neural Machine Translation
🌅
🌅
Old Age
A large annotated corpus for learning natural language inference
🌅
🌅
Old Age