A Mixed-Language Multi-Document News Summarization Dataset and a Graphs-Based Extract-Generate Model

October 13, 2024 ยท Declared Dead ยท ๐Ÿ› North American Chapter of the Association for Computational Linguistics

๐Ÿ’€ CAUSE OF DEATH: 404 Not Found
Code link is broken/dead
Authors Shengxiang Gao, Fang nan, Yongbing Zhang, Yuxin Huang, Kaiwen Tan, Zhengtao Yu arXiv ID 2410.09773 Category cs.CL: Computation & Language Citations 3 Venue North American Chapter of the Association for Computational Linguistics Repository https://github.com/Southnf9/MLMD-news} Last Checked 1 month ago
Abstract
Existing research on news summarization primarily focuses on single-language single-document (SLSD), single-language multi-document (SLMD) or cross-language single-document (CLSD). However, in real-world scenarios, news about a international event often involves multiple documents in different languages, i.e., mixed-language multi-document (MLMD). Therefore, summarizing MLMD news is of great significance. However, the lack of datasets for MLMD news summarization has constrained the development of research in this area. To fill this gap, we construct a mixed-language multi-document news summarization dataset (MLMD-news), which contains four different languages and 10,992 source document cluster and target summary pairs. Additionally, we propose a graph-based extract-generate model and benchmark various methods on the MLMD-news dataset and publicly release our dataset and code\footnote[1]{https://github.com/Southnf9/MLMD-news}, aiming to advance research in summarization within MLMD scenarios.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 8 years ago

Died the same way โ€” ๐Ÿ’€ 404 Not Found