R.I.P.
๐ป
Ghosted
Quality-Diversity Evolution for Discovering Diverse Vulnerabilities in LLM Safety
May 30, 2026 ยท Grace Period ยท ๐ the ICLR 2026 Workshop on Agents in the Wild
Authors
Subhadip Mitra
arXiv ID
2606.00801
Category
cs.CR: Cryptography & Security
Cross-listed
cs.CL,
cs.ET,
cs.LG,
cs.NE
Citations
0
Venue
the ICLR 2026 Workshop on Agents in the Wild
Abstract
Current approaches to LLM adversarial testing suffer from coverage gaps: manual red-teaming does not scale, LLM-as-attacker methods exhibit mode collapse, and gradient-based approaches produce uninterpretable gibberish. We introduce a quality-diversity evolutionary framework that operates at the semantic level, evolving interpretable attack strategies rather than token sequences. Using MAP-Elites, we maintain a diverse archive of attacks across behavioral dimensions (strategy type, encoding method, prompt length). In experiments across GPT-4o-mini, Claude 3.5 Sonnet, Gemini 2.0 Flash, and an open-weight coding model (Devstral-small-2), we discover distinct vulnerability profiles: GPT-4o-mini is vulnerable to hypothetical and multi-turn framing combined with ROT13 encoding (fitness 0.8), Gemini to direct attacks with ROT13 and multi-turn with Leetspeak (0.8), while Claude shows uniformly ambiguous responses across all strategies (max 0.4). The semantic representation produces interpretable attacks that reveal systematic, model-specific weaknesses, providing actionable insights for improving LLM safety and a reproducible baseline for evaluating future frontier models. Code and experiment artifacts are released at https://github.com/bassrehab/red-queen.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Cryptography & Security
R.I.P.
๐ป
Ghosted
The Limitations of Deep Learning in Adversarial Settings
R.I.P.
๐ป
Ghosted
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
R.I.P.
๐ป
Ghosted
Spectre Attacks: Exploiting Speculative Execution
R.I.P.
๐ป
Ghosted
How To Backdoor Federated Learning
R.I.P.
๐ป
Ghosted