๐
๐
Old Age
PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation
October 22, 2023 ยท Entered Twilight ยท ๐ Conference on Empirical Methods in Natural Language Processing
Repo contents: .gitignore, .havenignore, LICENSE, README.md, exp_configs, main.py, requirements.txt, scripts, src
Authors
Gaurav Sahu, Olga Vechtomova, Dzmitry Bahdanau, Issam H. Laradji
arXiv ID
2310.14192
Category
cs.CL: Computation & Language
Cross-listed
cs.AI
Citations
37
Venue
Conference on Empirical Methods in Natural Language Processing
Repository
https://github.com/ServiceNow/PromptMix-EMNLP-2023
โญ 12
Last Checked
1 month ago
Abstract
Data augmentation is a widely used technique to address the problem of text classification when there is a limited amount of training data. Recent work often tackles this problem using large language models (LLMs) like GPT3 that can generate new examples given already available ones. In this work, we propose a method to generate more helpful augmented data by utilizing the LLM's abilities to follow instructions and perform few-shot classifications. Our specific PromptMix method consists of two steps: 1) generate challenging text augmentations near class boundaries; however, generating borderline examples increases the risk of false positives in the dataset, so we 2) relabel the text augmentations using a prompting-based LLM classifier to enhance the correctness of labels in the generated data. We evaluate the proposed method in challenging 2-shot and zero-shot settings on four text classification datasets: Banking77, TREC6, Subjectivity (SUBJ), and Twitter Complaints. Our experiments show that generating and, crucially, relabeling borderline examples facilitates the transfer of knowledge of a massive LLM like GPT3.5-turbo into smaller and cheaper classifiers like DistilBERT$_{base}$ and BERT$_{base}$. Furthermore, 2-shot PromptMix outperforms multiple 5-shot data augmentation methods on the four datasets. Our code is available at https://github.com/ServiceNow/PromptMix-EMNLP-2023.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted