UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images
May 06, 2024 ยท Declared Dead ยท ๐ Conference on Computer and Communications Security
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Yiting Qu, Xinyue Shen, Yixin Wu, Michael Backes, Savvas Zannettou, Yang Zhang
arXiv ID
2405.03486
Category
cs.CR: Cryptography & Security
Cross-listed
cs.CV,
cs.SI
Citations
40
Venue
Conference on Computer and Communications Security
Last Checked
3 months ago
Abstract
With the advent of text-to-image models and concerns about their misuse, developers are increasingly relying on image safety classifiers to moderate their generated unsafe images. Yet, the performance of current image safety classifiers remains unknown for both real-world and AI-generated images. In this work, we propose UnsafeBench, a benchmarking framework that evaluates the effectiveness and robustness of image safety classifiers, with a particular focus on the impact of AI-generated images on their performance. First, we curate a large dataset of 10K real-world and AI-generated images that are annotated as safe or unsafe based on a set of 11 unsafe categories of images (sexual, violent, hateful, etc.). Then, we evaluate the effectiveness and robustness of five popular image safety classifiers, as well as three classifiers that are powered by general-purpose visual language models. Our assessment indicates that existing image safety classifiers are not comprehensive and effective enough to mitigate the multifaceted problem of unsafe images. Also, there exists a distribution shift between real-world and AI-generated images in image qualities, styles, and layouts, leading to degraded effectiveness and robustness. Motivated by these findings, we build a comprehensive image moderation tool called PerspectiveVision, which improves the effectiveness and robustness of existing classifiers, especially on AI-generated images. UnsafeBench and PerspectiveVision can aid the research community in better understanding the landscape of image safety classification in the era of generative AI.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Cryptography & Security
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
Membership Inference Attacks against Machine Learning Models
R.I.P.
๐ป
Ghosted
The Limitations of Deep Learning in Adversarial Settings
R.I.P.
๐ป
Ghosted
Practical Black-Box Attacks against Machine Learning
R.I.P.
๐ป
Ghosted
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
R.I.P.
๐ป
Ghosted
Extracting Training Data from Large Language Models
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted