R.I.P.
๐ป
Ghosted
Data Feedback Loops: Model-driven Amplification of Dataset Biases
September 08, 2022 ยท Entered Twilight ยท ๐ International Conference on Machine Learning
Repo contents: .gitignore, LICENSE, README.md, data, image_classification, language_generation, plotting, requirements.txt, visual_role_labeling
Authors
Rohan Taori, Tatsunori B. Hashimoto
arXiv ID
2209.03942
Category
cs.LG: Machine Learning
Cross-listed
cs.AI,
cs.CL,
cs.CV,
stat.ML
Citations
60
Venue
International Conference on Machine Learning
Repository
https://github.com/rtaori/data_feedback
โญ 18
Last Checked
1 month ago
Abstract
Datasets scraped from the internet have been critical to the successes of large-scale machine learning. Yet, this very success puts the utility of future internet-derived datasets at potential risk, as model outputs begin to replace human annotations as a source of supervision. In this work, we first formalize a system where interactions with one model are recorded as history and scraped as training data in the future. We then analyze its stability over time by tracking changes to a test-time bias statistic (e.g. gender bias of model predictions). We find that the degree of bias amplification is closely linked to whether the model's outputs behave like samples from the training distribution, a behavior which we characterize and define as consistent calibration. Experiments in three conditional prediction scenarios - image classification, visual role-labeling, and language generation - demonstrate that models that exhibit a sampling-like behavior are more calibrated and thus more stable. Based on this insight, we propose an intervention to help calibrate and stabilize unstable feedback systems. Code is available at https://github.com/rtaori/data_feedback.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
R.I.P.
๐ป
Ghosted
Semi-Supervised Classification with Graph Convolutional Networks
R.I.P.
๐ป
Ghosted
Proximal Policy Optimization Algorithms
R.I.P.
๐ป
Ghosted