FireBERT: Hardening BERT-based classifiers against adversarial attack

August 10, 2020 ยท Entered Twilight ยท ๐Ÿ› Advances in Intelligent Systems and Computing

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"Last commit was 5.0 years ago (โ‰ฅ5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: FireBERT.pdf, LICENSE, README.md, analysis.ipynb, bert_base_model.py, bert_imdb_tuner.ipynb, bert_mnli_tuner.ipynb, eval_imdb.ipynb, eval_mnli.ipynb, firebert_base.py, firebert_fct.py, firebert_fse.py, firebert_fve.py, firebert_imdb_and_adversarial_co-tuner.ipynb, firebert_mnli_and_adversarial_co-tuner.ipynb, generate_adversarials.ipynb, processors.py, randomsearchIMDB.py, randomsearchIMDB.sh, randomsearchMNLI.py, randomsearchMNLI.sh, switch.py

Authors Gunnar Mein, Kevin Hartman, Andrew Morris arXiv ID 2008.04203 Category cs.CL: Computation & Language Cross-listed cs.LG Citations 0 Venue Advances in Intelligent Systems and Computing Repository https://github.com/FireBERT-author/FireBERT โญ 7 Last Checked 2 months ago
Abstract
We present FireBERT, a set of three proof-of-concept NLP classifiers hardened against TextFooler-style word-perturbation by producing diverse alternatives to original samples. In one approach, we co-tune BERT against the training data and synthetic adversarial samples. In a second approach, we generate the synthetic samples at evaluation time through substitution of words and perturbation of embedding vectors. The diversified evaluation results are then combined by voting. A third approach replaces evaluation-time word substitution with perturbation of embedding vectors. We evaluate FireBERT for MNLI and IMDB Movie Review datasets, in the original and on adversarial examples generated by TextFooler. We also test whether TextFooler is less successful in creating new adversarial samples when manipulating FireBERT, compared to working on unhardened classifiers. We show that it is possible to improve the accuracy of BERT-based models in the face of adversarial attacks without significantly reducing the accuracy for regular benchmark samples. We present co-tuning with a synthetic data generator as a highly effective method to protect against 95% of pre-manufactured adversarial samples while maintaining 98% of original benchmark performance. We also demonstrate evaluation-time perturbation as a promising direction for further research, restoring accuracy up to 75% of benchmark performance for pre-made adversarials, and up to 65% (from a baseline of 75% orig. / 12% attack) under active attack by TextFooler.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computation & Language

๐ŸŒ… ๐ŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL ๐Ÿ› NeurIPS ๐Ÿ“š 166.0K cites 8 years ago