Trojaning Language Models for Fun and Profit
August 01, 2020 Β· Declared Dead Β· π European Symposium on Security and Privacy
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Xinyang Zhang, Zheng Zhang, Shouling Ji, Ting Wang
arXiv ID
2008.00312
Category
cs.CR: Cryptography & Security
Cross-listed
cs.CL,
cs.LG
Citations
158
Venue
European Symposium on Security and Privacy
Last Checked
4 months ago
Abstract
Recent years have witnessed the emergence of a new paradigm of building natural language processing (NLP) systems: general-purpose, pre-trained language models (LMs) are composed with simple downstream models and fine-tuned for a variety of NLP tasks. This paradigm shift significantly simplifies the system development cycles. However, as many LMs are provided by untrusted third parties, their lack of standardization or regulation entails profound security implications, which are largely unexplored. To bridge this gap, this work studies the security threats posed by malicious LMs to NLP systems. Specifically, we present TROJAN-LM, a new class of trojaning attacks in which maliciously crafted LMs trigger host NLP systems to malfunction in a highly predictable manner. By empirically studying three state-of-the-art LMs (BERT, GPT-2, XLNet) in a range of security-critical NLP tasks (toxic comment detection, question answering, text completion) as well as user studies on crowdsourcing platforms, we demonstrate that TROJAN-LM possesses the following properties: (i) flexibility - the adversary is able to flexibly dene logical combinations (e.g., 'and', 'or', 'xor') of arbitrary words as triggers, (ii) efficacy - the host systems misbehave as desired by the adversary with high probability when trigger-embedded inputs are present, (iii) specificity - the trojan LMs function indistinguishably from their benign counterparts on clean inputs, and (iv) fluency - the trigger-embedded inputs appear as fluent natural language and highly relevant to their surrounding contexts. We provide analytical justification for the practicality of TROJAN-LM, and further discuss potential countermeasures and their challenges, which lead to several promising research directions.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Cryptography & Security
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
The Limitations of Deep Learning in Adversarial Settings
R.I.P.
π»
Ghosted
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
R.I.P.
π»
Ghosted
Spectre Attacks: Exploiting Speculative Execution
R.I.P.
π»
Ghosted
How To Backdoor Federated Learning
R.I.P.
π»
Ghosted
Evasion Attacks against Machine Learning at Test Time
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted