XFEVER: Exploring Fact Verification across Languages
October 25, 2023 ยท Entered Twilight ยท ๐ Taiwan Conference on Computational Linguistics and Speech Processing
Repo contents: .flake8, .gitignore, .pre-commit-config.yaml, LICENSE, README.md, adafactor.py, evaluate.py, experiments, lightning_base.py, modeling_base.py, predict.py, processors.py, requirements.txt, train.py, utils.py
Authors
Yi-Chen Chang, Canasai Kruengkrai, Junichi Yamagishi
arXiv ID
2310.16278
Category
cs.CL: Computation & Language
Cross-listed
cs.AI
Citations
6
Venue
Taiwan Conference on Computational Linguistics and Speech Processing
Repository
https://github.com/nii-yamagishilab/xfever
โญ 2
Last Checked
1 month ago
Abstract
This paper introduces the Cross-lingual Fact Extraction and VERification (XFEVER) dataset designed for benchmarking the fact verification models across different languages. We constructed it by translating the claim and evidence texts of the Fact Extraction and VERification (FEVER) dataset into six languages. The training and development sets were translated using machine translation, whereas the test set includes texts translated by professional translators and machine-translated texts. Using the XFEVER dataset, two cross-lingual fact verification scenarios, zero-shot learning and translate-train learning, are defined, and baseline models for each scenario are also proposed in this paper. Experimental results show that the multilingual language model can be used to build fact verification models in different languages efficiently. However, the performance varies by language and is somewhat inferior to the English case. We also found that we can effectively mitigate model miscalibration by considering the prediction similarity between the English and target languages. The XFEVER dataset, code, and model checkpoints are available at https://github.com/nii-yamagishilab/xfever.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted