Improving Cross-lingual Information Retrieval on Low-Resource Languages via Optimal Transport Distillation
January 29, 2023 ยท Declared Dead ยท ๐ Web Search and Data Mining
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Zhiqi Huang, Puxuan Yu, James Allan
arXiv ID
2301.12566
Category
cs.CL: Computation & Language
Cross-listed
cs.IR
Citations
36
Venue
Web Search and Data Mining
Last Checked
3 months ago
Abstract
Benefiting from transformer-based pre-trained language models, neural ranking models have made significant progress. More recently, the advent of multilingual pre-trained language models provides great support for designing neural cross-lingual retrieval models. However, due to unbalanced pre-training data in different languages, multilingual language models have already shown a performance gap between high and low-resource languages in many downstream tasks. And cross-lingual retrieval models built on such pre-trained models can inherit language bias, leading to suboptimal result for low-resource languages. Moreover, unlike the English-to-English retrieval task, where large-scale training collections for document ranking such as MS MARCO are available, the lack of cross-lingual retrieval data for low-resource language makes it more challenging for training cross-lingual retrieval models. In this work, we propose OPTICAL: Optimal Transport distillation for low-resource Cross-lingual information retrieval. To transfer a model from high to low resource languages, OPTICAL forms the cross-lingual token alignment task as an optimal transport problem to learn from a well-trained monolingual retrieval model. By separating the cross-lingual knowledge from knowledge of query document matching, OPTICAL only needs bitext data for distillation training, which is more feasible for low-resource languages. Experimental results show that, with minimal training data, OPTICAL significantly outperforms strong baselines on low-resource languages, including neural machine translation.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted
Deep contextualized word representations
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
R.I.P.
๐ป
Ghosted