Copyright Violations and Large Language Models

October 20, 2023 Β· Declared Dead Β· πŸ› Conference on Empirical Methods in Natural Language Processing

πŸ’€ CAUSE OF DEATH: 404 Not Found
Code link is broken/dead
Authors Antonia Karamolegkou, Jiaang Li, Li Zhou, Anders SΓΈgaard arXiv ID 2310.13771 Category cs.CL: Computation & Language Cross-listed cs.AI Citations 117 Venue Conference on Empirical Methods in Natural Language Processing Repository https://github.com/coastalcph/CopyrightLLMs} Last Checked 1 month ago
Abstract
Language models may memorize more than just facts, including entire chunks of texts seen during training. Fair use exemptions to copyright laws typically allow for limited use of copyrighted material without permission from the copyright holder, but typically for extraction of information from copyrighted materials, rather than {\em verbatim} reproduction. This work explores the issue of copyright violations and large language models through the lens of verbatim memorization, focusing on possible redistribution of copyrighted text. We present experiments with a range of language models over a collection of popular books and coding problems, providing a conservative characterization of the extent to which language models can redistribute these materials. Overall, this research highlights the need for further examination and the potential impact on future developments in natural language processing to ensure adherence to copyright regulations. Code is at \url{https://github.com/coastalcph/CopyrightLLMs}.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Computation & Language

πŸŒ… πŸŒ… Old Age

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, ... (+6 more)

cs.CL πŸ› NeurIPS πŸ“š 166.0K cites 8 years ago

Died the same way β€” πŸ’€ 404 Not Found