R.I.P.
๐ป
Ghosted
Mass-Producing Failures of Multimodal Systems with Language Models
June 21, 2023 ยท Entered Twilight ยท ๐ Neural Information Processing Systems
Repo contents: Demo, Images, Multimon_emoji.png, Pipeline, README.md, User Study (Answers).xlsx, User_Study.csv, annotate.py, requirements.txt, scrape.py
Authors
Shengbang Tong, Erik Jones, Jacob Steinhardt
arXiv ID
2306.12105
Category
cs.LG: Machine Learning
Cross-listed
cs.CL,
cs.SE
Citations
45
Venue
Neural Information Processing Systems
Repository
https://github.com/tsb0601/MultiMon
โญ 25
Last Checked
1 month ago
Abstract
Deployed multimodal systems can fail in ways that evaluators did not anticipate. In order to find these failures before deployment, we introduce MultiMon, a system that automatically identifies systematic failures -- generalizable, natural-language descriptions of patterns of model failures. To uncover systematic failures, MultiMon scrapes a corpus for examples of erroneous agreement: inputs that produce the same output, but should not. It then prompts a language model (e.g., GPT-4) to find systematic patterns of failure and describe them in natural language. We use MultiMon to find 14 systematic failures (e.g., "ignores quantifiers") of the CLIP text-encoder, each comprising hundreds of distinct inputs (e.g., "a shelf with a few/many books"). Because CLIP is the backbone for most state-of-the-art multimodal systems, these inputs produce failures in Midjourney 5.1, DALL-E, VideoFusion, and others. MultiMon can also steer towards failures relevant to specific use cases, such as self-driving cars. We see MultiMon as a step towards evaluation that autonomously explores the long tail of potential system failures. Code for MULTIMON is available at https://github.com/tsb0601/MultiMon.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
R.I.P.
๐ป
Ghosted
Semi-Supervised Classification with Graph Convolutional Networks
R.I.P.
๐ป
Ghosted
Proximal Policy Optimization Algorithms
R.I.P.
๐ป
Ghosted