Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model

September 17, 2019 ยท Declared Dead ยท ๐Ÿ› ECML/PKDD

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Prashanth Vijayaraghavan, Deb Roy arXiv ID 1909.07873 Category cs.LG: Machine Learning Cross-listed cs.CL, cs.IR, stat.ML Citations 39 Venue ECML/PKDD Last Checked 4 months ago
Abstract
Recently, generating adversarial examples has become an important means of measuring robustness of a deep learning model. Adversarial examples help us identify the susceptibilities of the model and further counter those vulnerabilities by applying adversarial training techniques. In natural language domain, small perturbations in the form of misspellings or paraphrases can drastically change the semantics of the text. We propose a reinforcement learning based approach towards generating adversarial examples in black-box settings. We demonstrate that our method is able to fool well-trained models for (a) IMDB sentiment classification task and (b) AG's news corpus news categorization task with significantly high success rates. We find that the adversarial examples generated are semantics-preserving perturbations to the original text.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning

Died the same way โ€” ๐Ÿ‘ป Ghosted