Anonymized BERT: An Augmentation Approach to the Gendered Pronoun Resolution Challenge
May 06, 2019 ยท Entered Twilight ยท ๐ Proceedings of the First Workshop on Gender Bias in Natural Language Processing
"Last commit was 6.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: README.md, Step1_preprocessing.ipynb, Step2_end2end_model.ipynb, Step3_pure_bert_model.ipynb, Step4_inference.ipynb, gap-development-corrected-74.tsv, gap-test-val-85.tsv
Authors
Bo Liu
arXiv ID
1905.01780
Category
cs.CL: Computation & Language
Cross-listed
cs.AI,
cs.LG
Citations
8
Venue
Proceedings of the First Workshop on Gender Bias in Natural Language Processing
Repository
https://github.com/boliu61/gendered-pronoun-resolution
โญ 24
Last Checked
1 month ago
Abstract
We present our 7th place solution to the Gendered Pronoun Resolution challenge, which uses BERT without fine-tuning and a novel augmentation strategy designed for contextual embedding token-level tasks. Our method anonymizes the referent by replacing candidate names with a set of common placeholder names. Besides the usual benefits of effectively increasing training data size, this approach diversifies idiosyncratic information embedded in names. Using same set of common first names can also help the model recognize names better, shorten token length, and remove gender and regional biases associated with names. The system scored 0.1947 log loss in stage 2, where the augmentation contributed to an improvements of 0.04. Post-competition analysis shows that, when using different embedding layers, the system scores 0.1799 which would be third place.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted