The Perceived Fragility of Explanations in Audio Models: Manipulation of Attribution with Unchanged Predictions

June 12, 2026 · Grace Period · 🏛 the ICML 2026 Workshop on Machine Learning for Audio: 5 pages

Authors Piotr Kitłowski, Dominik Wiącek, Mateusz Modrzejewski arXiv ID 2606.14466 Category cs.SD: Sound Cross-listed cs.AI, cs.LG Citations 0 Venue the ICML 2026 Workshop on Machine Learning for Audio: 5 pages

Abstract

This paper investigates the fragility of post-hoc explanation methods in audio deepfake detection. While previous work on explanation manipulation focused on images using standard $L_p$ metrics, we introduce a psychoacoustic framework that optimizes inaudible perturbations to decouple model attributions from final classifications. We evaluate this vulnerability across state-of-the-art architectures under strict prediction-preserving constraints. By evaluating the manipulation cost through domain-specific perceptual audio quality metrics alongside explanation alignment criteria, our framework demonstrates that an adversary can systematically distort automated explanation heatmaps while preserving the predicted deepfake label. Full code available at: https://github.com/cncPomper/Audio-XAI