The Journey, Not the Destination: How Data Guides Diffusion Models

December 11, 2023 ยท Entered Twilight ยท ๐Ÿ› arXiv.org

๐Ÿ’ค TWILIGHT: Eternal Rest
Repo abandoned since publication

Repo contents: .gitignore, LICENSE, README.md, assets, diffusion_trak, examples, setup.py

Authors Kristian Georgiev, Joshua Vendrow, Hadi Salman, Sung Min Park, Aleksander Madry arXiv ID 2312.06205 Category cs.CV: Computer Vision Cross-listed cs.LG Citations 36 Venue arXiv.org Repository https://github.com/MadryLab/journey-TRAK โญ 25 Last Checked 1 month ago
Abstract
Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity. However, attributing these images back to the training data-that is, identifying specific training examples which caused an image to be generated-remains a challenge. In this paper, we propose a framework that: (i) provides a formal notion of data attribution in the context of diffusion models, and (ii) allows us to counterfactually validate such attributions. Then, we provide a method for computing these attributions efficiently. Finally, we apply our method to find (and evaluate) such attributions for denoising diffusion probabilistic models trained on CIFAR-10 and latent diffusion models trained on MS COCO. We provide code at https://github.com/MadryLab/journey-TRAK .
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computer Vision