Explaining Away Attacks Against Neural Networks

March 06, 2020 · Entered Twilight · 🏛 arXiv.org

"Last commit was 5.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, LICENSE, README.md, dict_class_to_idx.pkl, elephant.png, explain_away_attacks.ipynb, generate_integrated_gradients.py, images, map_clsloc.txt

Authors Sean Saito, Jin Wang arXiv ID 2003.05748 Category cs.LG: Machine Learning Cross-listed cs.CR, cs.CV, stat.ML Citations 0 Venue arXiv.org Repository https://github.com/seansaito/Explaining-Away-Attacks-Against-Neural-Networks ⭐ 5 Last Checked 2 months ago

Abstract

We investigate the problem of identifying adversarial attacks on image-based neural networks. We present intriguing experimental results showing significant discrepancies between the explanations generated for the predictions of a model on clean and adversarial data. Utilizing this intuition, we propose a framework which can identify whether a given input is adversarial based on the explanations given by the model. Code for our experiments can be found here: https://github.com/seansaito/Explaining-Away-Attacks-Against-Neural-Networks.