Automatic Generation of Test Cases based on Bug Reports: a Feasibility Study with Large Language Models

October 10, 2023 · Declared Dead · 🏛 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Laura Plein, Wendkûuni C. Ouédraogo, Jacques Klein, Tegawendé F. Bissyandé arXiv ID 2310.06320 Category cs.SE: Software Engineering Citations 34 Venue 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion) Last Checked 3 months ago

Abstract

Software testing is a core discipline in software engineering where a large array of research results has been produced, notably in the area of automatic test generation. Because existing approaches produce test cases that either can be qualified as simple (e.g. unit tests) or that require precise specifications, most testing procedures still rely on test cases written by humans to form test suites. Such test suites, however, are incomplete: they only cover parts of the project or they are produced after the bug is fixed. Yet, several research challenges, such as automatic program repair, and practitioner processes, build on the assumption that available test suites are sufficient. There is thus a need to break existing barriers in automatic test case generation. While prior work largely focused on random unit testing inputs, we propose to consider generating test cases that realistically represent complex user execution scenarios, which reveal buggy behaviour. Such scenarios are informally described in bug reports, which should therefore be considered as natural inputs for specifying bug-triggering test cases. In this work, we investigate the feasibility of performing this generation by leveraging large language models (LLMs) and using bug reports as inputs. Our experiments include the use of ChatGPT, as an online service, as well as CodeGPT, a code-related pre-trained LLM that was fine-tuned for our task. Overall, we experimentally show that bug reports associated to up to 50% of Defects4J bugs can prompt ChatGPT to generate an executable test case. We show that even new bug reports can indeed be used as input for generating executable test cases. Finally, we report experimental results which confirm that LLM-generated test cases are immediately useful in software engineering tasks such as fault localization as well as patch validation in automated program repair.