Explaining Away Attacks Against Neural Networks

March 06, 2020 ยท Entered Twilight ยท ๐Ÿ› arXiv.org

๐ŸŒ… TWILIGHT: Old Age
Predates the code-sharing era โ€” a pioneer of its time

"Last commit was 5.0 years ago (โ‰ฅ5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, LICENSE, README.md, dict_class_to_idx.pkl, elephant.png, explain_away_attacks.ipynb, generate_integrated_gradients.py, images, map_clsloc.txt

Authors Sean Saito, Jin Wang arXiv ID 2003.05748 Category cs.LG: Machine Learning Cross-listed cs.CR, cs.CV, stat.ML Citations 0 Venue arXiv.org Repository https://github.com/seansaito/Explaining-Away-Attacks-Against-Neural-Networks โญ 5 Last Checked 2 months ago
Abstract
We investigate the problem of identifying adversarial attacks on image-based neural networks. We present intriguing experimental results showing significant discrepancies between the explanations generated for the predictions of a model on clean and adversarial data. Utilizing this intuition, we propose a framework which can identify whether a given input is adversarial based on the explanations given by the model. Code for our experiments can be found here: https://github.com/seansaito/Explaining-Away-Attacks-Against-Neural-Networks.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Machine Learning