Rethinking the Reverse-engineering of Trojan Triggers
October 27, 2022 ยท Entered Twilight ยท ๐ Neural Information Processing Systems
Repo contents: LICENSE.md, README.md, config.py, dataloader.py, detection.py, image, mitigation.py, models.py, models, requirements.txt, resnet_nole.py, reverse_engineering.py, train_models, unet_blocks.py, unet_model.py
Authors
Zhenting Wang, Kai Mei, Hailun Ding, Juan Zhai, Shiqing Ma
arXiv ID
2210.15127
Category
cs.CR: Cryptography & Security
Cross-listed
cs.AI,
cs.CV,
cs.LG
Citations
51
Venue
Neural Information Processing Systems
Repository
https://github.com/RU-System-Software-and-Security/FeatureRE
โญ 27
Last Checked
1 month ago
Abstract
Deep Neural Networks are vulnerable to Trojan (or backdoor) attacks. Reverse-engineering methods can reconstruct the trigger and thus identify affected models. Existing reverse-engineering methods only consider input space constraints, e.g., trigger size in the input space. Expressly, they assume the triggers are static patterns in the input space and fail to detect models with feature space triggers such as image style transformations. We observe that both input-space and feature-space Trojans are associated with feature space hyperplanes. Based on this observation, we design a novel reverse-engineering method that exploits the feature space constraint to reverse-engineer Trojan triggers. Results on four datasets and seven different attacks demonstrate that our solution effectively defends both input-space and feature-space Trojans. It outperforms state-of-the-art reverse-engineering methods and other types of defenses in both Trojaned model detection and mitigation tasks. On average, the detection accuracy of our method is 93\%. For Trojan mitigation, our method can reduce the ASR (attack success rate) to only 0.26\% with the BA (benign accuracy) remaining nearly unchanged. Our code can be found at https://github.com/RU-System-Software-and-Security/FeatureRE.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Cryptography & Security
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
Membership Inference Attacks against Machine Learning Models
R.I.P.
๐ป
Ghosted
The Limitations of Deep Learning in Adversarial Settings
R.I.P.
๐ป
Ghosted
Practical Black-Box Attacks against Machine Learning
R.I.P.
๐ป
Ghosted
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
R.I.P.
๐ป
Ghosted