Few-shot Adaptation Works with UnpredicTable Data
August 01, 2022 ยท Entered Twilight ยท ๐ Annual Meeting of the Association for Computational Linguistics
Repo contents: .gitmodules, MetaICL, README.md, dataset_demo.ipynb, download_and_process_all.sh, download_and_process_slice.sbatch, img, requirements.txt, tables_to_tasks.py
Authors
Jun Shern Chan, Michael Pieler, Jonathan Jao, Jรฉrรฉmy Scheurer, Ethan Perez
arXiv ID
2208.01009
Category
cs.CL: Computation & Language
Cross-listed
cs.AI,
cs.LG
Citations
6
Venue
Annual Meeting of the Association for Computational Linguistics
Repository
https://github.com/JunShern/few-shot-adaptation
โญ 24
Last Checked
1 month ago
Abstract
Prior work on language models (LMs) shows that training on a large number of diverse tasks improves few-shot learning (FSL) performance on new tasks. We take this to the extreme, automatically extracting 413,299 tasks from internet tables - orders of magnitude more than the next-largest public datasets. Finetuning on the resulting dataset leads to improved FSL performance on Natural Language Processing (NLP) tasks, but not proportionally to dataset scale. In fact, we find that narrow subsets of our dataset sometimes outperform more diverse datasets. For example, finetuning on software documentation from support.google.com raises FSL performance by a mean of +7.5% on 52 downstream tasks, which beats training on 40 human-curated NLP datasets (+6.7%). Finetuning on various narrow datasets leads to similar broad improvements across test tasks, suggesting that the gains are not from domain adaptation but adapting to FSL in general. We do not observe clear patterns between the datasets that lead to FSL gains, leaving open questions about why certain data helps with FSL.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted