WhyGen: Explaining ML-powered Code Generation by Referring to Training Examples

April 17, 2022 Β· Declared Dead Β· πŸ› 2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Weixiang Yan, Yuanchun Li arXiv ID 2204.07940 Category cs.SE: Software Engineering Citations 13 Venue 2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion) Last Checked 3 months ago
Abstract
Deep learning has demonstrated great abilities in various code generation tasks. However, despite the great convenience for some developers, many are concerned that the code generators may recite or closely mimic copyrighted training data without user awareness, leading to legal and ethical concerns. To ease this problem, we introduce a tool, named WhyGen, to explain the generated code by referring to training examples. Specifically, we first introduce a data structure, named inference fingerprint, to represent the decision process of the model when generating a prediction. The fingerprints of all training examples are collected offline and saved to a database. When the model is used at runtime for code generation, the most relevant training examples can be retrieved by querying the fingerprint database. Our experiments have shown that WhyGen is able to precisely notify the users about possible recitations and highly similar imitations with a top-10 accuracy of 81.21%. The demo video can be found at https://youtu.be/EtoQP6850To.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Software Engineering

Died the same way β€” πŸ‘» Ghosted