The Perceived Fragility of Explanations in Audio Models: Manipulation of Attribution with Unchanged Predictions

June 12, 2026 ยท Grace Period ยท ๐Ÿ› the ICML 2026 Workshop on Machine Learning for Audio: 5 pages

โณ Grace Period
This paper is less than 90 days old. We give authors time to release their code before passing judgment.
Authors Piotr Kitล‚owski, Dominik Wiฤ…cek, Mateusz Modrzejewski arXiv ID 2606.14466 Category cs.SD: Sound Cross-listed cs.AI, cs.LG Citations 0 Venue the ICML 2026 Workshop on Machine Learning for Audio: 5 pages
Abstract
This paper investigates the fragility of post-hoc explanation methods in audio deepfake detection. While previous work on explanation manipulation focused on images using standard $L_p$ metrics, we introduce a psychoacoustic framework that optimizes inaudible perturbations to decouple model attributions from final classifications. We evaluate this vulnerability across state-of-the-art architectures under strict prediction-preserving constraints. By evaluating the manipulation cost through domain-specific perceptual audio quality metrics alongside explanation alignment criteria, our framework demonstrates that an adversary can systematically distort automated explanation heatmaps while preserving the predicted deepfake label. Full code available at: https://github.com/cncPomper/Audio-XAI
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Sound