How Not to Detect Prompt Injections with an LLM

July 08, 2025 ยท Declared Dead ยท ๐Ÿ› AISec@CCS

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Sarthak Choudhary, Divyam Anshumaan, Nils Palumbo, Somesh Jha arXiv ID 2507.05630 Category cs.CR: Cryptography & Security Cross-listed cs.AI, cs.LG Citations 5 Venue AISec@CCS Last Checked 3 months ago
Abstract
LLM-integrated applications and agents are vulnerable to prompt injection attacks, where adversaries embed malicious instructions within seemingly benign input data to manipulate the LLM's intended behavior. Recent defenses based on known-answer detection (KAD) scheme have reported near-perfect performance by observing an LLM's output to classify input data as clean or contaminated. KAD attempts to repurpose the very susceptibility to prompt injection as a defensive mechanism. We formally characterize the KAD scheme and uncover a structural vulnerability that invalidates its core security premise. To exploit this fundamental vulnerability, we methodically design an adaptive attack, DataFlip. It consistently evades KAD defenses, achieving detection rates as low as $0\%$ while reliably inducing malicious behavior with a success rate of $91\%$, all without requiring white-box access to the LLM or any optimization procedures.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Cryptography & Security

Died the same way โ€” ๐Ÿ‘ป Ghosted