Multilingual Non-Factoid Question Answering with Answer Paragraph Selection

August 20, 2024 · Declared Dead · 🏛 Pacific-Asia Conference on Knowledge Discovery and Data Mining

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Ritwik Mishra, Sreeram Vennam, Rajiv Ratn Shah, Ponnurangam Kumaraguru arXiv ID 2408.10604 Category cs.CL: Computation & Language Cross-listed cs.AI, cs.IR, cs.LG Citations 0 Venue Pacific-Asia Conference on Knowledge Discovery and Data Mining Last Checked 3 months ago

Abstract

Most existing Question Answering Datasets (QuADs) primarily focus on factoid-based short-context Question Answering (QA) in high-resource languages. However, the scope of such datasets for low-resource languages remains limited, with only a few works centered on factoid-based QuADs and none on non-factoid QuADs. Therefore, this work presents MuNfQuAD, a multilingual QuAD with non-factoid questions. It utilizes interrogative sub-headings from BBC news articles as questions and the corresponding paragraphs as silver answers. The dataset comprises over 578K QA pairs across 38 languages, encompassing several low-resource languages, and stands as the largest multilingual QA dataset to date. Based on the manual annotations of 790 QA-pairs from MuNfQuAD (golden set), we observe that 98\% of questions can be answered using their corresponding silver answer. Our fine-tuned Answer Paragraph Selection (APS) model outperforms the baselines. The APS model attained an accuracy of 80\% and 72\%, as well as a macro F1 of 72\% and 66\%, on the MuNfQuAD testset and the golden set, respectively. Furthermore, the APS model effectively generalizes a certain language within the golden set, even after being fine-tuned on silver labels. We also observe that the fine-tuned APS model is beneficial for reducing the context of a question. These findings suggest that this resource would be a valuable contribution to the QA research community.