Will Large-scale Generative Models Corrupt Future Datasets?

November 15, 2022 ยท Declared Dead ยท ๐Ÿ› IEEE International Conference on Computer Vision

๐Ÿ’€ CAUSE OF DEATH: 404 Not Found
Code link is broken/dead
Authors Ryuichiro Hataya, Han Bao, Hiromi Arai arXiv ID 2211.08095 Category cs.CV: Computer Vision Citations 72 Venue IEEE International Conference on Computer Vision Repository https://github.com/moskomule/dataset-contamination} Last Checked 1 month ago
Abstract
Recently proposed large-scale text-to-image generative models such as DALL$\cdot$E 2, Midjourney, and StableDiffusion can generate high-quality and realistic images from users' prompts. Not limited to the research community, ordinary Internet users enjoy these generative models, and consequently, a tremendous amount of generated images have been shared on the Internet. Meanwhile, today's success of deep learning in the computer vision field owes a lot to images collected from the Internet. These trends lead us to a research question: "\textbf{will such generated images impact the quality of future datasets and the performance of computer vision models positively or negatively?}" This paper empirically answers this question by simulating contamination. Namely, we generate ImageNet-scale and COCO-scale datasets using a state-of-the-art generative model and evaluate models trained with "contaminated" datasets on various tasks, including image classification and image generation. Throughout experiments, we conclude that generated images negatively affect downstream performance, while the significance depends on tasks and the amount of generated images. The generated datasets and the codes for experiments will be publicly released for future research. Generated datasets and source codes are available from \url{https://github.com/moskomule/dataset-contamination}.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Computer Vision

Died the same way โ€” ๐Ÿ’€ 404 Not Found