PrivacyXray: Detecting Privacy Breaches in LLMs through Semantic Consistency and Probability Certainty
June 24, 2025 Β· Declared Dead Β· π USENIX Security Symposium
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Jinwen He, Yiyang Lu, Zijin Lin, Kai Chen, Yue Zhao
arXiv ID
2506.19563
Category
cs.CR: Cryptography & Security
Cross-listed
cs.AI
Citations
1
Venue
USENIX Security Symposium
Last Checked
4 months ago
Abstract
Large Language Models (LLMs) are widely used in sensitive domains, including healthcare, finance, and legal services, raising concerns about potential private information leaks during inference. Privacy extraction attacks, such as jailbreaking, expose vulnerabilities in LLMs by crafting inputs that force the models to output sensitive information. However, these attacks cannot verify whether the extracted private information is accurate, as no public datasets exist for cross-validation, leaving a critical gap in private information detection during inference. To address this, we propose PrivacyXray, a novel framework detecting privacy breaches by analyzing LLM inner states. Our analysis reveals that LLMs exhibit higher semantic coherence and probabilistic certainty when generating correct private outputs. Based on this, PrivacyXray detects privacy breaches using four metrics: intra-layer and inter-layer semantic similarity, token-level and sentence-level probability distributions. PrivacyXray addresses critical challenges in private information detection by overcoming the lack of open-source private datasets and eliminating reliance on external data for validation. It achieves this through the synthesis of realistic private data and a detection mechanism based on the inner states of LLMs. Experiments show that PrivacyXray achieves consistent performance, with an average accuracy of 92.69% across five LLMs. Compared to state-of-the-art methods, PrivacyXray achieves significant improvements, with an average accuracy increase of 20.06%, highlighting its stability and practical utility in real-world applications.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Cryptography & Security
R.I.P.
π»
Ghosted
R.I.P.
π»
Ghosted
The Limitations of Deep Learning in Adversarial Settings
R.I.P.
π»
Ghosted
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
R.I.P.
π»
Ghosted
Spectre Attacks: Exploiting Speculative Execution
R.I.P.
π»
Ghosted
How To Backdoor Federated Learning
R.I.P.
π»
Ghosted
Evasion Attacks against Machine Learning at Test Time
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted