LLM Dataset Inference: Did you train on my dataset?
June 10, 2024 ยท Entered Twilight ยท ๐ Neural Information Processing Systems
Repo contents: .gitignore, LICENSE, README.md, analysis.py, correction_script.py, data_creator.py, dataloader.py, demo.ipynb, di.py, files, linear_di.py, metrics.py, requirements.txt, results_reader.py, scripts, selected_features.py, transform.py, utils.py
Authors
Pratyush Maini, Hengrui Jia, Nicolas Papernot, Adam Dziedzic
arXiv ID
2406.06443
Category
cs.LG: Machine Learning
Cross-listed
cs.CL,
cs.CR
Citations
106
Venue
Neural Information Processing Systems
Repository
https://github.com/pratyushmaini/llm_dataset_inference/
โญ 42
Last Checked
1 month ago
Abstract
The proliferation of large language models (LLMs) in the real world has come with a rise in copyright cases against companies for training their models on unlicensed data from the internet. Recent works have presented methods to identify if individual text sequences were members of the model's training data, known as membership inference attacks (MIAs). We demonstrate that the apparent success of these MIAs is confounded by selecting non-members (text sequences not used for training) belonging to a different distribution from the members (e.g., temporally shifted recent Wikipedia articles compared with ones used to train the model). This distribution shift makes membership inference appear successful. However, most MIA methods perform no better than random guessing when discriminating between members and non-members from the same distribution (e.g., in this case, the same period of time). Even when MIAs work, we find that different MIAs succeed at inferring membership of samples from different distributions. Instead, we propose a new dataset inference method to accurately identify the datasets used to train large language models. This paradigm sits realistically in the modern-day copyright landscape, where authors claim that an LLM is trained over multiple documents (such as a book) written by them, rather than one particular paragraph. While dataset inference shares many of the challenges of membership inference, we solve it by selectively combining the MIAs that provide positive signal for a given distribution, and aggregating them to perform a statistical test on a given dataset. Our approach successfully distinguishes the train and test sets of different subsets of the Pile with statistically significant p-values < 0.1, without any false positives.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Machine Learning
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
R.I.P.
๐ป
Ghosted
Semi-Supervised Classification with Graph Convolutional Networks
R.I.P.
๐ป
Ghosted
Proximal Policy Optimization Algorithms
R.I.P.
๐ป
Ghosted