R.I.P.
π»
Ghosted
Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition
October 24, 2023 Β· Entered Twilight Β· π arXiv.org
Repo contents: .gitignore, LICENSE.md, README.md, analysis, hackaprompt, pyproject.toml, requirements.txt, tests
Authors
Sander Schulhoff, Jeremy Pinto, Anaum Khan, Louis-FranΓ§ois Bouchard, Chenglei Si, Svetlina Anati, Valen Tagliabue, Anson Liu Kost, Christopher Carnahan, Jordan Boyd-Graber
arXiv ID
2311.16119
Category
cs.CR: Cryptography & Security
Cross-listed
cs.AI,
cs.CL
Citations
66
Venue
arXiv.org
Repository
https://github.com/PromptLabs/hackaprompt
β 22
Last Checked
1 month ago
Abstract
Large Language Models (LLMs) are deployed in interactive contexts with direct user engagement, such as chatbots and writing assistants. These deployments are vulnerable to prompt injection and jailbreaking (collectively, prompt hacking), in which models are manipulated to ignore their original instructions and follow potentially malicious ones. Although widely acknowledged as a significant security threat, there is a dearth of large-scale resources and quantitative studies on prompt hacking. To address this lacuna, we launch a global prompt hacking competition, which allows for free-form human input attacks. We elicit 600K+ adversarial prompts against three state-of-the-art LLMs. We describe the dataset, which empirically verifies that current LLMs can indeed be manipulated via prompt hacking. We also present a comprehensive taxonomical ontology of the types of adversarial prompts.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Cryptography & Security
R.I.P.
π»
Ghosted
Membership Inference Attacks against Machine Learning Models
R.I.P.
π»
Ghosted
The Limitations of Deep Learning in Adversarial Settings
R.I.P.
π»
Ghosted
Practical Black-Box Attacks against Machine Learning
R.I.P.
π»
Ghosted
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
R.I.P.
π»
Ghosted